1.. SPDX-License-Identifier: GPL-2.0 2 3=================== 4ice devlink support 5=================== 6 7This document describes the devlink features implemented by the ``ice`` 8device driver. 9 10Parameters 11========== 12 13.. list-table:: Generic parameters implemented 14 15 * - Name 16 - Mode 17 - Notes 18 * - ``enable_roce`` 19 - runtime 20 - mutually exclusive with ``enable_iwarp`` 21 * - ``enable_iwarp`` 22 - runtime 23 - mutually exclusive with ``enable_roce`` 24 25Info versions 26============= 27 28The ``ice`` driver reports the following versions 29 30.. list-table:: devlink info versions implemented 31 :widths: 5 5 5 90 32 33 * - Name 34 - Type 35 - Example 36 - Description 37 * - ``board.id`` 38 - fixed 39 - K65390-000 40 - The Product Board Assembly (PBA) identifier of the board. 41 * - ``fw.mgmt`` 42 - running 43 - 2.1.7 44 - 3-digit version number of the management firmware running on the 45 Embedded Management Processor of the device. It controls the PHY, 46 link, access to device resources, etc. Intel documentation refers to 47 this as the EMP firmware. 48 * - ``fw.mgmt.api`` 49 - running 50 - 1.5.1 51 - 3-digit version number (major.minor.patch) of the API exported over 52 the AdminQ by the management firmware. Used by the driver to 53 identify what commands are supported. Historical versions of the 54 kernel only displayed a 2-digit version number (major.minor). 55 * - ``fw.mgmt.build`` 56 - running 57 - 0x305d955f 58 - Unique identifier of the source for the management firmware. 59 * - ``fw.undi`` 60 - running 61 - 1.2581.0 62 - Version of the Option ROM containing the UEFI driver. The version is 63 reported in ``major.minor.patch`` format. The major version is 64 incremented whenever a major breaking change occurs, or when the 65 minor version would overflow. The minor version is incremented for 66 non-breaking changes and reset to 1 when the major version is 67 incremented. The patch version is normally 0 but is incremented when 68 a fix is delivered as a patch against an older base Option ROM. 69 * - ``fw.psid.api`` 70 - running 71 - 0.80 72 - Version defining the format of the flash contents. 73 * - ``fw.bundle_id`` 74 - running 75 - 0x80002ec0 76 - Unique identifier of the firmware image file that was loaded onto 77 the device. Also referred to as the EETRACK identifier of the NVM. 78 * - ``fw.app.name`` 79 - running 80 - ICE OS Default Package 81 - The name of the DDP package that is active in the device. The DDP 82 package is loaded by the driver during initialization. Each 83 variation of the DDP package has a unique name. 84 * - ``fw.app`` 85 - running 86 - 1.3.1.0 87 - The version of the DDP package that is active in the device. Note 88 that both the name (as reported by ``fw.app.name``) and version are 89 required to uniquely identify the package. 90 * - ``fw.app.bundle_id`` 91 - running 92 - 0xc0000001 93 - Unique identifier for the DDP package loaded in the device. Also 94 referred to as the DDP Track ID. Can be used to uniquely identify 95 the specific DDP package. 96 * - ``fw.netlist`` 97 - running 98 - 1.1.2000-6.7.0 99 - The version of the netlist module. This module defines the device's 100 Ethernet capabilities and default settings, and is used by the 101 management firmware as part of managing link and device 102 connectivity. 103 * - ``fw.netlist.build`` 104 - running 105 - 0xee16ced7 106 - The first 4 bytes of the hash of the netlist module contents. 107 108Flash Update 109============ 110 111The ``ice`` driver implements support for flash update using the 112``devlink-flash`` interface. It supports updating the device flash using a 113combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and 114``fw.netlist`` components. 115 116.. list-table:: List of supported overwrite modes 117 :widths: 5 95 118 119 * - Bits 120 - Behavior 121 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` 122 - Do not preserve settings stored in the flash components being 123 updated. This includes overwriting the port configuration that 124 determines the number of physical functions the device will 125 initialize with. 126 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` 127 - Do not preserve either settings or identifiers. Overwrite everything 128 in the flash with the contents from the provided image, without 129 performing any preservation. This includes overwriting device 130 identifying fields such as the MAC address, VPD area, and device 131 serial number. It is expected that this combination be used with an 132 image customized for the specific device. 133 134The ice hardware does not support overwriting only identifiers while 135preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its 136own will be rejected. If no overwrite mask is provided, the firmware will be 137instructed to preserve all settings and identifying fields when updating. 138 139Reload 140====== 141 142The ``ice`` driver supports activating new firmware after a flash update 143using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE`` 144action. 145 146.. code:: shell 147 148 $ devlink dev reload pci/0000:01:00.0 reload action fw_activate 149 150The new firmware is activated by issuing a device specific Embedded 151Management Processor reset which requests the device to reset and reload the 152EMP firmware image. 153 154The driver does not currently support reloading the driver via 155``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. 156 157Port split 158========== 159 160The ``ice`` driver supports port splitting only for port 0, as the FW has 161a predefined set of available port split options for the whole device. 162 163A system reboot is required for port split to be applied. 164 165The following command will select the port split option with 4 ports: 166 167.. code:: shell 168 169 $ devlink port split pci/0000:16:00.0/0 count 4 170 171The list of all available port options will be printed to dynamic debug after 172each ``split`` and ``unsplit`` command. The first option is the default. 173 174.. code:: shell 175 176 ice 0000:16:00.0: Available port split options and max port speeds (Gbps): 177 ice 0000:16:00.0: Status Split Quad 0 Quad 1 178 ice 0000:16:00.0: count L0 L1 L2 L3 L4 L5 L6 L7 179 ice 0000:16:00.0: Active 2 100 - - - 100 - - - 180 ice 0000:16:00.0: 2 50 - 50 - - - - - 181 ice 0000:16:00.0: Pending 4 25 25 25 25 - - - - 182 ice 0000:16:00.0: 4 25 25 - - 25 25 - - 183 ice 0000:16:00.0: 8 10 10 10 10 10 10 10 10 184 ice 0000:16:00.0: 1 100 - - - - - - - 185 186There could be multiple FW port options with the same port split count. When 187the same port split count request is issued again, the next FW port option with 188the same port split count will be selected. 189 190``devlink port unsplit`` will select the option with a split count of 1. If 191there is no FW option available with split count 1, you will receive an error. 192 193Regions 194======= 195 196The ``ice`` driver implements the following regions for accessing internal 197device data. 198 199.. list-table:: regions implemented 200 :widths: 15 85 201 202 * - Name 203 - Description 204 * - ``nvm-flash`` 205 - The contents of the entire flash chip, sometimes referred to as 206 the device's Non Volatile Memory. 207 * - ``shadow-ram`` 208 - The contents of the Shadow RAM, which is loaded from the beginning 209 of the flash. Although the contents are primarily from the flash, 210 this area also contains data generated during device boot which is 211 not stored in flash. 212 * - ``device-caps`` 213 - The contents of the device firmware's capabilities buffer. Useful to 214 determine the current state and configuration of the device. 215 216Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a 217snapshot. The ``device-caps`` region requires a snapshot as the contents are 218sent by firmware and can't be split into separate reads. 219 220Users can request an immediate capture of a snapshot for all three regions 221via the ``DEVLINK_CMD_REGION_NEW`` command. 222 223.. code:: shell 224 225 $ devlink region show 226 pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1 227 pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10 228 229 $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 230 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 231 232 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 233 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 234 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 235 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc 236 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 237 238 $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 239 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 240 241 $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1 242 243 $ devlink region new pci/0000:01:00.0/device-caps snapshot 1 244 $ devlink region dump pci/0000:01:00.0/device-caps snapshot 1 245 0000000000000000 01 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 246 0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 247 0000000000000020 02 00 02 01 32 03 00 00 0a 00 00 00 25 00 00 00 248 0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 249 0000000000000040 04 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 250 0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 251 0000000000000060 05 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 252 0000000000000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 253 0000000000000080 06 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 254 0000000000000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 255 00000000000000a0 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 256 00000000000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 257 00000000000000c0 12 00 01 00 01 00 00 00 01 00 01 00 00 00 00 00 258 00000000000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 259 00000000000000e0 13 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 260 00000000000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 261 0000000000000100 14 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 262 0000000000000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 263 0000000000000120 15 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 264 0000000000000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 265 0000000000000140 16 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 266 0000000000000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 267 0000000000000160 17 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00 268 0000000000000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 269 0000000000000180 18 00 01 00 01 00 00 00 01 00 00 00 08 00 00 00 270 0000000000000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 271 00000000000001a0 22 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 272 00000000000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 273 00000000000001c0 40 00 01 00 00 08 00 00 08 00 00 00 00 00 00 00 274 00000000000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 275 00000000000001e0 41 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 276 00000000000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 277 0000000000000200 42 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 278 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 279 280 $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 281 282Devlink Rate 283============ 284 285The ``ice`` driver implements devlink-rate API. It allows for offload of 286the Hierarchical QoS to the hardware. It enables user to group Virtual 287Functions in a tree structure and assign supported parameters: tx_share, 288tx_max, tx_priority and tx_weight to each node in a tree. So effectively 289user gains an ability to control how much bandwidth is allocated for each 290VF group. This is later enforced by the HW. 291 292It is assumed that this feature is mutually exclusive with DCB performed 293in FW and ADQ, or any driver feature that would trigger changes in QoS, 294for example creation of the new traffic class. The driver will prevent DCB 295or ADQ configuration if user started making any changes to the nodes using 296devlink-rate API. To configure those features a driver reload is necessary. 297Correspondingly if ADQ or DCB will get configured the driver won't export 298hierarchy at all, or will remove the untouched hierarchy if those 299features are enabled after the hierarchy is exported, but before any 300changes are made. 301 302This feature is also dependent on switchdev being enabled in the system. 303It's required because devlink-rate requires devlink-port objects to be 304present, and those objects are only created in switchdev mode. 305 306If the driver is set to the switchdev mode, it will export internal 307hierarchy the moment VF's are created. Root of the tree is always 308represented by the node_0. This node can't be deleted by the user. Leaf 309nodes and nodes with children also can't be deleted. 310 311.. list-table:: Attributes supported 312 :widths: 15 85 313 314 * - Name 315 - Description 316 * - ``tx_max`` 317 - maximum bandwidth to be consumed by the tree Node. Rate Limit is 318 an absolute number specifying a maximum amount of bytes a Node may 319 consume during the course of one second. Rate limit guarantees 320 that a link will not oversaturate the receiver on the remote end 321 and also enforces an SLA between the subscriber and network 322 provider. 323 * - ``tx_share`` 324 - minimum bandwidth allocated to a tree node when it is not blocked. 325 It specifies an absolute BW. While tx_max defines the maximum 326 bandwidth the node may consume, the tx_share marks committed BW 327 for the Node. 328 * - ``tx_priority`` 329 - allows for usage of strict priority arbiter among siblings. This 330 arbitration scheme attempts to schedule nodes based on their 331 priority as long as the nodes remain within their bandwidth limit. 332 Range 0-7. Nodes with priority 7 have the highest priority and are 333 selected first, while nodes with priority 0 have the lowest 334 priority. Nodes that have the same priority are treated equally. 335 * - ``tx_weight`` 336 - allows for usage of Weighted Fair Queuing arbitration scheme among 337 siblings. This arbitration scheme can be used simultaneously with 338 the strict priority. Range 1-200. Only relative values matter for 339 arbitration. 340 341``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case 342nodes with the same priority form a WFQ subgroup in the sibling group 343and arbitration among them is based on assigned weights. 344 345.. code:: shell 346 347 # enable switchdev 348 $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev 349 350 # at this point driver should export internal hierarchy 351 $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs 352 353 $ devlink port function rate show 354 pci/0000:4b:00.0/node_25: type node parent node_24 355 pci/0000:4b:00.0/node_24: type node parent node_0 356 pci/0000:4b:00.0/node_32: type node parent node_31 357 pci/0000:4b:00.0/node_31: type node parent node_30 358 pci/0000:4b:00.0/node_30: type node parent node_16 359 pci/0000:4b:00.0/node_19: type node parent node_18 360 pci/0000:4b:00.0/node_18: type node parent node_17 361 pci/0000:4b:00.0/node_17: type node parent node_16 362 pci/0000:4b:00.0/node_14: type node parent node_5 363 pci/0000:4b:00.0/node_5: type node parent node_3 364 pci/0000:4b:00.0/node_13: type node parent node_4 365 pci/0000:4b:00.0/node_12: type node parent node_4 366 pci/0000:4b:00.0/node_11: type node parent node_4 367 pci/0000:4b:00.0/node_10: type node parent node_4 368 pci/0000:4b:00.0/node_9: type node parent node_4 369 pci/0000:4b:00.0/node_8: type node parent node_4 370 pci/0000:4b:00.0/node_7: type node parent node_4 371 pci/0000:4b:00.0/node_6: type node parent node_4 372 pci/0000:4b:00.0/node_4: type node parent node_3 373 pci/0000:4b:00.0/node_3: type node parent node_16 374 pci/0000:4b:00.0/node_16: type node parent node_15 375 pci/0000:4b:00.0/node_15: type node parent node_0 376 pci/0000:4b:00.0/node_2: type node parent node_1 377 pci/0000:4b:00.0/node_1: type node parent node_0 378 pci/0000:4b:00.0/node_0: type node 379 pci/0000:4b:00.0/1: type leaf parent node_25 380 pci/0000:4b:00.0/2: type leaf parent node_25 381 382 # let's create some custom node 383 $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0 384 385 # second custom node 386 $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom 387 388 # reassign second VF to newly created branch 389 $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1 390 391 # assign tx_weight to the VF 392 $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5 393 394 # assign tx_share to the VF 395 $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps 396