1========================= 2NXP SJA1105 switch driver 3========================= 4 5Overview 6======== 7 8The NXP SJA1105 is a family of 10 SPI-managed automotive switches: 9 10- SJA1105E: First generation, no TTEthernet 11- SJA1105T: First generation, TTEthernet 12- SJA1105P: Second generation, no TTEthernet, no SGMII 13- SJA1105Q: Second generation, TTEthernet, no SGMII 14- SJA1105R: Second generation, no TTEthernet, SGMII 15- SJA1105S: Second generation, TTEthernet, SGMII 16- SJA1110A: Third generation, TTEthernet, SGMII, integrated 100base-T1 and 17 100base-TX PHYs 18- SJA1110B: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX 19- SJA1110C: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX 20- SJA1110D: Third generation, TTEthernet, SGMII, 100base-T1 21 22Being automotive parts, their configuration interface is geared towards 23set-and-forget use, with minimal dynamic interaction at runtime. They 24require a static configuration to be composed by software and packed 25with CRC and table headers, and sent over SPI. 26 27The static configuration is composed of several configuration tables. Each 28table takes a number of entries. Some configuration tables can be (partially) 29reconfigured at runtime, some not. Some tables are mandatory, some not: 30 31============================= ================== ============================= 32Table Mandatory Reconfigurable 33============================= ================== ============================= 34Schedule no no 35Schedule entry points if Scheduling no 36VL Lookup no no 37VL Policing if VL Lookup no 38VL Forwarding if VL Lookup no 39L2 Lookup no no 40L2 Policing yes no 41VLAN Lookup yes yes 42L2 Forwarding yes partially (fully on P/Q/R/S) 43MAC Config yes partially (fully on P/Q/R/S) 44Schedule Params if Scheduling no 45Schedule Entry Points Params if Scheduling no 46VL Forwarding Params if VL Forwarding no 47L2 Lookup Params no partially (fully on P/Q/R/S) 48L2 Forwarding Params yes no 49Clock Sync Params no no 50AVB Params no no 51General Params yes partially 52Retagging no yes 53xMII Params yes no 54SGMII no yes 55============================= ================== ============================= 56 57 58Also the configuration is write-only (software cannot read it back from the 59switch except for very few exceptions). 60 61The driver creates a static configuration at probe time, and keeps it at 62all times in memory, as a shadow for the hardware state. When required to 63change a hardware setting, the static configuration is also updated. 64If that changed setting can be transmitted to the switch through the dynamic 65reconfiguration interface, it is; otherwise the switch is reset and 66reprogrammed with the updated static configuration. 67 68Switching features 69================== 70 71The driver supports the configuration of L2 forwarding rules in hardware for 72port bridging. The forwarding, broadcast and flooding domain between ports can 73be restricted through two methods: either at the L2 forwarding level (isolate 74one bridge's ports from another's) or at the VLAN port membership level 75(isolate ports within the same bridge). The final forwarding decision taken by 76the hardware is a logical AND of these two sets of rules. 77 78The hardware tags all traffic internally with a port-based VLAN (pvid), or it 79decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification 80is not possible. Once attributed a VLAN tag, frames are checked against the 81port's membership rules and dropped at ingress if they don't match any VLAN. 82This behavior is available when switch ports are enslaved to a bridge with 83``vlan_filtering 1``. 84 85Normally the hardware is not configurable with respect to VLAN awareness, but 86by changing what TPID the switch searches 802.1Q tags for, the semantics of a 87bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or 88untagged), and therefore this mode is also supported. 89 90Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but 91all bridges should have the same level of VLAN awareness (either both have 92``vlan_filtering`` 0, or both 1). 93 94Topology and loop detection through STP is supported. 95 96Offloads 97======== 98 99Time-aware scheduling 100--------------------- 101 102The switch supports a variation of the enhancements for scheduled traffic 103specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to 104ensure deterministic latency for priority traffic that is sent in-band with its 105gate-open event in the network schedule. 106 107This capability can be managed through the tc-taprio offload ('flags 2'). The 108difference compared to the software implementation of taprio is that the latter 109would only be able to shape traffic originated from the CPU, but not 110autonomously forwarded flows. 111 112The device has 8 traffic classes, and maps incoming frames to one of them based 113on the VLAN PCP bits (if no VLAN is present, the port-based default is used). 114As described in the previous sections, depending on the value of 115``vlan_filtering``, the EtherType recognized by the switch as being VLAN can 116either be the typical 0x8100 or a custom value used internally by the driver 117for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone 118or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100 119EtherType. In these modes, injecting into a particular TX queue can only be 120done by the DSA net devices, which populate the PCP field of the tagging header 121on egress. Using ``vlan_filtering=1``, the behavior is the other way around: 122offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA 123net devices are no longer able to do that. To inject frames into a hardware TX 124queue with VLAN awareness active, it is necessary to create a VLAN 125sub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged 126towards the switch, with the VLAN PCP bits set appropriately. 127 128Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the 129notable exception: the switch always treats it with a fixed priority and 130disregards any VLAN PCP bits even if present. The traffic class for management 131traffic has a value of 7 (highest priority) at the moment, which is not 132configurable in the driver. 133 134Below is an example of configuring a 500 us cyclic schedule on egress port 135``swp5``. The traffic class gate for management traffic (7) is open for 100 us, 136and the gates for all other traffic classes are open for 400 us:: 137 138 #!/bin/bash 139 140 set -e -u -o pipefail 141 142 NSEC_PER_SEC="1000000000" 143 144 gatemask() { 145 local tc_list="$1" 146 local mask=0 147 148 for tc in ${tc_list}; do 149 mask=$((${mask} | (1 << ${tc}))) 150 done 151 152 printf "%02x" ${mask} 153 } 154 155 if ! systemctl is-active --quiet ptp4l; then 156 echo "Please start the ptp4l service" 157 exit 158 fi 159 160 now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') 161 # Phase-align the base time to the start of the next second. 162 sec=$(echo "${now}" | gawk -F. '{ print $1; }') 163 base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" 164 165 tc qdisc add dev swp5 parent root handle 100 taprio \ 166 num_tc 8 \ 167 map 0 1 2 3 5 6 7 \ 168 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ 169 base-time ${base_time} \ 170 sched-entry S $(gatemask 7) 100000 \ 171 sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ 172 flags 2 173 174It is possible to apply the tc-taprio offload on multiple egress ports. There 175are hardware restrictions related to the fact that no gate event may trigger 176simultaneously on two ports. The driver checks the consistency of the schedules 177against this restriction and errors out when appropriate. Schedule analysis is 178needed to avoid this, which is outside the scope of the document. 179 180Routing actions (redirect, trap, drop) 181-------------------------------------- 182 183The switch is able to offload flow-based redirection of packets to a set of 184destination ports specified by the user. Internally, this is implemented by 185making use of Virtual Links, a TTEthernet concept. 186 187The driver supports 2 types of keys for Virtual Links: 188 189- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and 190 VLAN PCP. 191- VLAN-unaware virtual links: these match on destination MAC address only. 192 193The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while 194there are virtual link rules installed. 195 196Composing multiple actions inside the same rule is supported. When only routing 197actions are requested, the driver creates a "non-critical" virtual link. When 198the action list also contains tc-gate (more details below), the virtual link 199becomes "time-critical" (draws frame buffers from a reserved memory partition, 200etc). 201 202The 3 routing actions that are supported are "trap", "drop" and "redirect". 203 204Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the 205CPU and to swp3. This type of key (DA only) when the port's VLAN awareness 206state is off:: 207 208 tc qdisc add dev swp2 clsact 209 tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \ 210 action mirred egress redirect dev swp3 \ 211 action trap 212 213Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID 214of 100 and a PCP of 0:: 215 216 tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \ 217 dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop 218 219Time-based ingress policing 220--------------------------- 221 222The TTEthernet hardware abilities of the switch can be constrained to act 223similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in 224IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform 225tight timing-based admission control for up to 1024 flows (identified by a 226tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which 227are received outside their expected reception window are dropped. 228 229This capability can be managed through the offload of the tc-gate action. As 230routing actions are intrinsic to virtual links in TTEthernet (which performs 231explicit routing of time-critical traffic and does not leave that in the hands 232of the FDB, flooding etc), the tc-gate action may never appear alone when 233asking sja1105 to offload it. One (or more) redirect or trap actions must also 234follow along. 235 236Example: create a tc-taprio schedule that is phase-aligned with a tc-gate 237schedule (the clocks must be synchronized by a 1588 application stack, which is 238outside the scope of this document). No packet delivered by the sender will be 239dropped. Note that the reception window is larger than the transmission window 240(and much more so, in this example) to compensate for the packet propagation 241delay of the link (which can be determined by the 1588 application stack). 242 243Receiver (sja1105):: 244 245 tc qdisc add dev swp2 clsact 246 now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ 247 sec=$(echo $now | awk -F. '{print $1}') && \ 248 base_time="$(((sec + 2) * 1000000000))" && \ 249 echo "base time ${base_time}" 250 tc filter add dev swp2 ingress flower skip_sw \ 251 dst_mac 42:be:24:9b:76:20 \ 252 action gate base-time ${base_time} \ 253 sched-entry OPEN 60000 -1 -1 \ 254 sched-entry CLOSE 40000 -1 -1 \ 255 action trap 256 257Sender:: 258 259 now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ 260 sec=$(echo $now | awk -F. '{print $1}') && \ 261 base_time="$(((sec + 2) * 1000000000))" && \ 262 echo "base time ${base_time}" 263 tc qdisc add dev eno0 parent root taprio \ 264 num_tc 8 \ 265 map 0 1 2 3 4 5 6 7 \ 266 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ 267 base-time ${base_time} \ 268 sched-entry S 01 50000 \ 269 sched-entry S 00 50000 \ 270 flags 2 271 272The engine used to schedule the ingress gate operations is the same that the 273one used for the tc-taprio offload. Therefore, the restrictions regarding the 274fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at 275the same time (during the same 200 ns slot) still apply. 276 277To come in handy, it is possible to share time-triggered virtual links across 278more than 1 ingress port, via flow blocks. In this case, the restriction of 279firing at the same time does not apply because there is a single schedule in 280the system, that of the shared virtual link:: 281 282 tc qdisc add dev swp2 ingress_block 1 clsact 283 tc qdisc add dev swp3 ingress_block 1 clsact 284 tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \ 285 action gate index 2 \ 286 base-time 0 \ 287 sched-entry OPEN 50000000 -1 -1 \ 288 sched-entry CLOSE 50000000 -1 -1 \ 289 action trap 290 291Hardware statistics for each flow are also available ("pkts" counts the number 292of dropped frames, which is a sum of frames dropped due to timing violations, 293lack of destination ports and MTU enforcement checks). Byte-level counters are 294not available. 295 296Limitations 297=========== 298 299The SJA1105 switch family always performs VLAN processing. When configured as 300VLAN-unaware, frames carry a different VLAN tag internally, depending on 301whether the port is standalone or under a VLAN-unaware bridge. 302 303The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but the 304driver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-aware 305bridge. Otherwise, it fills in the VLAN ID and PCP automatically, based on 306whether the port is standalone or in a VLAN-unaware bridge, and accepts only 307"VLAN-unaware" tc-flower keys (MAC DA). 308 309The existing tc-flower keys that are offloaded using virtual links are no 310longer operational after one of the following happens: 311 312- port was standalone and joins a bridge (VLAN-aware or VLAN-unaware) 313- port is part of a bridge whose VLAN awareness state changes 314- port was part of a bridge and becomes standalone 315- port was standalone, but another port joins a VLAN-aware bridge and this 316 changes the global VLAN awareness state of the bridge 317 318The driver cannot veto all these operations, and it cannot update/remove the 319existing tc-flower filters either. So for proper operation, the tc-flower 320filters should be installed only after the forwarding configuration of the port 321has been made, and removed by user space before making any changes to it. 322 323Device Tree bindings and board design 324===================================== 325 326This section references ``Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml`` 327and aims to showcase some potential switch caveats. 328 329RMII PHY role and out-of-band signaling 330--------------------------------------- 331 332In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by 333an external oscillator (but not by the PHY). 334But the spec is rather loose and devices go outside it in several ways. 335Some PHYs go against the spec and may provide an output pin where they source 336the 50 MHz clock themselves, in an attempt to be helpful. 337On the other hand, the SJA1105 is only binary configurable - when in the RMII 338MAC role it will also attempt to drive the clock signal. To prevent this from 339happening it must be put in RMII PHY role. 340But doing so has some unintended consequences. 341In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0]. 342These are practically some extra code words (/J/ and /K/) sent prior to the 343preamble of each frame. The MAC does not have this out-of-band signaling 344mechanism defined by the RMII spec. 345So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the 346clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105 347emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to 348frame preambles, which the real PHY is not expected to understand. So the PHY 349simply encodes the extra symbols received from the SJA1105-as-PHY onto the 350100Base-Tx wire. 351On the other side of the wire, some link partners might discard these extra 352symbols, while others might choke on them and discard the entire Ethernet 353frames that follow along. This looks like packet loss with some link partners 354but not with others. 355The take-away is that in RMII mode, the SJA1105 must be let to drive the 356reference clock if connected to a PHY. 357 358RGMII fixed-link and internal delays 359------------------------------------ 360 361As mentioned in the bindings document, the second generation of devices has 362tunable delay lines as part of the MAC, which can be used to establish the 363correct RGMII timing budget. 364When powered up, these can shift the Rx and Tx clocks with a phase difference 365between 73.8 and 101.7 degrees. 366The catch is that the delay lines need to lock onto a clock signal with a 367stable frequency. This means that there must be at least 2 microseconds of 368silence between the clock at the old vs at the new frequency. Otherwise the 369lock is lost and the delay lines must be reset (powered down and back up). 370In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25 371MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the 372AN process. 373In the situation where the switch port is connected through an RGMII fixed-link 374to a link partner whose link state life cycle is outside the control of Linux 375(such as a different SoC), then the delay lines would remain unlocked (and 376inactive) until there is manual intervention (ifdown/ifup on the switch port). 377The take-away is that in RGMII mode, the switch's internal delays are only 378reliable if the link partner never changes link speeds, or if it does, it does 379so in a way that is coordinated with the switch port (practically, both ends of 380the fixed-link are under control of the same Linux system). 381As to why would a fixed-link interface ever change link speeds: there are 382Ethernet controllers out there which come out of reset in 100 Mbps mode, and 383their driver inevitably needs to change the speed and clock frequency if it's 384required to work at gigabit. 385 386MDIO bus and PHY management 387--------------------------- 388 389The SJA1105 does not have an MDIO bus and does not perform in-band AN either. 390Therefore there is no link state notification coming from the switch device. 391A board would need to hook up the PHYs connected to the switch to any other 392MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO 393bus). Link state management then works by the driver manually keeping in sync 394(over SPI commands) the MAC link speed with the settings negotiated by the PHY. 395 396By comparison, the SJA1110 supports an MDIO slave access point over which its 397internal 100base-T1 PHYs can be accessed from the host. This is, however, not 398used by the driver, instead the internal 100base-T1 and 100base-TX PHYs are 399accessed through SPI commands, modeled in Linux as virtual MDIO buses. 400 401The microcontroller attached to the SJA1110 port 0 also has an MDIO controller 402operating in master mode, however the driver does not support this either, 403since the microcontroller gets disabled when the Linux driver operates. 404Discrete PHYs connected to the switch ports should have their MDIO interface 405attached to an MDIO controller from the host system and not to the switch, 406similar to SJA1105. 407 408Port compatibility matrix 409------------------------- 410 411The SJA1105 port compatibility matrix is: 412 413===== ============== ============== ============== 414Port SJA1105E/T SJA1105P/Q SJA1105R/S 415===== ============== ============== ============== 4160 xMII xMII xMII 4171 xMII xMII xMII 4182 xMII xMII xMII 4193 xMII xMII xMII 4204 xMII xMII SGMII 421===== ============== ============== ============== 422 423 424The SJA1110 port compatibility matrix is: 425 426===== ============== ============== ============== ============== 427Port SJA1110A SJA1110B SJA1110C SJA1110D 428===== ============== ============== ============== ============== 4290 RevMII (uC) RevMII (uC) RevMII (uC) RevMII (uC) 4301 100base-TX 100base-TX 100base-TX 431 or SGMII SGMII 4322 xMII xMII xMII xMII 433 or SGMII or SGMII 4343 xMII xMII xMII 435 or SGMII or SGMII SGMII 436 or 2500base-X or 2500base-X or 2500base-X 4374 SGMII SGMII SGMII SGMII 438 or 2500base-X or 2500base-X or 2500base-X or 2500base-X 4395 100base-T1 100base-T1 100base-T1 100base-T1 4406 100base-T1 100base-T1 100base-T1 100base-T1 4417 100base-T1 100base-T1 100base-T1 100base-T1 4428 100base-T1 100base-T1 n/a n/a 4439 100base-T1 100base-T1 n/a n/a 44410 100base-T1 n/a n/a n/a 445===== ============== ============== ============== ============== 446