1f42c104fSJakub Kicinski.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2f42c104fSJakub Kicinski 3f42c104fSJakub Kicinski================== 4f42c104fSJakub KicinskiKernel TLS offload 5f42c104fSJakub Kicinski================== 6f42c104fSJakub Kicinski 7f42c104fSJakub KicinskiKernel TLS operation 8f42c104fSJakub Kicinski==================== 9f42c104fSJakub Kicinski 10f42c104fSJakub KicinskiLinux kernel provides TLS connection offload infrastructure. Once a TCP 11f42c104fSJakub Kicinskiconnection is in ``ESTABLISHED`` state user space can enable the TLS Upper 12f42c104fSJakub KicinskiLayer Protocol (ULP) and install the cryptographic connection state. 13f42c104fSJakub KicinskiFor details regarding the user-facing interface refer to the TLS 14f42c104fSJakub Kicinskidocumentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`. 15f42c104fSJakub Kicinski 16f42c104fSJakub Kicinski``ktls`` can operate in three modes: 17f42c104fSJakub Kicinski 18f42c104fSJakub Kicinski * Software crypto mode (``TLS_SW``) - CPU handles the cryptography. 19f42c104fSJakub Kicinski In most basic cases only crypto operations synchronous with the CPU 20f42c104fSJakub Kicinski can be used, but depending on calling context CPU may utilize 21f42c104fSJakub Kicinski asynchronous crypto accelerators. The use of accelerators introduces extra 22f42c104fSJakub Kicinski latency on socket reads (decryption only starts when a read syscall 23f42c104fSJakub Kicinski is made) and additional I/O load on the system. 24f42c104fSJakub Kicinski * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto 25f42c104fSJakub Kicinski on a packet by packet basis, provided the packets arrive in order. 26f42c104fSJakub Kicinski This mode integrates best with the kernel stack and is described in detail 27f42c104fSJakub Kicinski in the remaining part of this document 28f42c104fSJakub Kicinski (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``). 29f42c104fSJakub Kicinski * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where 30f42c104fSJakub Kicinski NIC driver and firmware replace the kernel networking stack 31f42c104fSJakub Kicinski with its own TCP handling, it is not usable in production environments 32f42c104fSJakub Kicinski making use of the Linux networking stack for example any firewalling 33f42c104fSJakub Kicinski abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``). 34f42c104fSJakub Kicinski 35f42c104fSJakub KicinskiThe operation mode is selected automatically based on device configuration, 36f42c104fSJakub Kicinskioffload opt-in or opt-out on per-connection basis is not currently supported. 37f42c104fSJakub Kicinski 38f42c104fSJakub KicinskiTX 39f42c104fSJakub Kicinski-- 40f42c104fSJakub Kicinski 41f42c104fSJakub KicinskiAt a high level user write requests are turned into a scatter list, the TLS ULP 42f42c104fSJakub Kicinskiintercepts them, inserts record framing, performs encryption (in ``TLS_SW`` 43f42c104fSJakub Kicinskimode) and then hands the modified scatter list to the TCP layer. From this 44f42c104fSJakub Kicinskipoint on the TCP stack proceeds as normal. 45f42c104fSJakub Kicinski 46f42c104fSJakub KicinskiIn ``TLS_HW`` mode the encryption is not performed in the TLS ULP. 47f42c104fSJakub KicinskiInstead packets reach a device driver, the driver will mark the packets 48f42c104fSJakub Kicinskifor crypto offload based on the socket the packet is attached to, 49f42c104fSJakub Kicinskiand send them to the device for encryption and transmission. 50f42c104fSJakub Kicinski 51f42c104fSJakub KicinskiRX 52f42c104fSJakub Kicinski-- 53f42c104fSJakub Kicinski 54f42c104fSJakub KicinskiOn the receive side if the device handled decryption and authentication 55f42c104fSJakub Kicinskisuccessfully, the driver will set the decrypted bit in the associated 56f42c104fSJakub Kicinski:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and 57f42c104fSJakub Kicinskiare handled normally. ``ktls`` is informed when data is queued to the socket 58f42c104fSJakub Kicinskiand the ``strparser`` mechanism is used to delineate the records. Upon read 59f42c104fSJakub Kicinskirequest, records are retrieved from the socket and passed to decryption routine. 60f42c104fSJakub KicinskiIf device decrypted all the segments of the record the decryption is skipped, 61f42c104fSJakub Kicinskiotherwise software path handles decryption. 62f42c104fSJakub Kicinski 63f42c104fSJakub Kicinski.. kernel-figure:: tls-offload-layers.svg 64f42c104fSJakub Kicinski :alt: TLS offload layers 65f42c104fSJakub Kicinski :align: center 66f42c104fSJakub Kicinski :figwidth: 28em 67f42c104fSJakub Kicinski 68f42c104fSJakub Kicinski Layers of Kernel TLS stack 69f42c104fSJakub Kicinski 70f42c104fSJakub KicinskiDevice configuration 71f42c104fSJakub Kicinski==================== 72f42c104fSJakub Kicinski 73f42c104fSJakub KicinskiDuring driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and 74f42c104fSJakub Kicinski``NETIF_F_HW_TLS_TX`` features and installs its 75f42c104fSJakub Kicinski:c:type:`struct tlsdev_ops <tlsdev_ops>` 76f42c104fSJakub Kicinskipointer in the :c:member:`tlsdev_ops` member of the 77f42c104fSJakub Kicinski:c:type:`struct net_device <net_device>`. 78f42c104fSJakub Kicinski 79f42c104fSJakub KicinskiWhen TLS cryptographic connection state is installed on a ``ktls`` socket 80f42c104fSJakub Kicinski(note that it is done twice, once for RX and once for TX direction, 81f42c104fSJakub Kicinskiand the two are completely independent), the kernel checks if the underlying 82f42c104fSJakub Kicinskinetwork device is offload-capable and attempts the offload. In case offload 83f42c104fSJakub Kicinskifails the connection is handled entirely in software using the same mechanism 84f42c104fSJakub Kicinskias if the offload was never tried. 85f42c104fSJakub Kicinski 86f42c104fSJakub KicinskiOffload request is performed via the :c:member:`tls_dev_add` callback of 87f42c104fSJakub Kicinski:c:type:`struct tlsdev_ops <tlsdev_ops>`: 88f42c104fSJakub Kicinski 89f42c104fSJakub Kicinski.. code-block:: c 90f42c104fSJakub Kicinski 91f42c104fSJakub Kicinski int (*tls_dev_add)(struct net_device *netdev, struct sock *sk, 92f42c104fSJakub Kicinski enum tls_offload_ctx_dir direction, 93f42c104fSJakub Kicinski struct tls_crypto_info *crypto_info, 94f42c104fSJakub Kicinski u32 start_offload_tcp_sn); 95f42c104fSJakub Kicinski 96f42c104fSJakub Kicinski``direction`` indicates whether the cryptographic information is for 97f42c104fSJakub Kicinskithe received or transmitted packets. Driver uses the ``sk`` parameter 98f42c104fSJakub Kicinskito retrieve the connection 5-tuple and socket family (IPv4 vs IPv6). 99f42c104fSJakub KicinskiCryptographic information in ``crypto_info`` includes the key, iv, salt 100f42c104fSJakub Kicinskias well as TLS record sequence number. ``start_offload_tcp_sn`` indicates 101f42c104fSJakub Kicinskiwhich TCP sequence number corresponds to the beginning of the record with 102f42c104fSJakub Kicinskisequence number from ``crypto_info``. The driver can add its state 103f42c104fSJakub Kicinskiat the end of kernel structures (see :c:member:`driver_state` members 104f42c104fSJakub Kicinskiin ``include/net/tls.h``) to avoid additional allocations and pointer 105f42c104fSJakub Kicinskidereferences. 106f42c104fSJakub Kicinski 107f42c104fSJakub KicinskiTX 108f42c104fSJakub Kicinski-- 109f42c104fSJakub Kicinski 110f42c104fSJakub KicinskiAfter TX state is installed, the stack guarantees that the first segment 111f42c104fSJakub Kicinskiof the stream will start exactly at the ``start_offload_tcp_sn`` sequence 112f42c104fSJakub Kicinskinumber, simplifying TCP sequence number matching. 113f42c104fSJakub Kicinski 114f42c104fSJakub KicinskiTX offload being fully initialized does not imply that all segments passing 115f42c104fSJakub Kicinskithrough the driver and which belong to the offloaded socket will be after 116f42c104fSJakub Kicinskithe expected sequence number and will have kernel record information. 117f42c104fSJakub KicinskiIn particular, already encrypted data may have been queued to the socket 118f42c104fSJakub Kicinskibefore installing the connection state in the kernel. 119f42c104fSJakub Kicinski 120f42c104fSJakub KicinskiRX 121f42c104fSJakub Kicinski-- 122f42c104fSJakub Kicinski 123f42c104fSJakub KicinskiIn RX direction local networking stack has little control over the segmentation, 124f42c104fSJakub Kicinskiso the initial records' TCP sequence number may be anywhere inside the segment. 125f42c104fSJakub Kicinski 126f42c104fSJakub KicinskiNormal operation 127f42c104fSJakub Kicinski================ 128f42c104fSJakub Kicinski 129f42c104fSJakub KicinskiAt the minimum the device maintains the following state for each connection, in 130f42c104fSJakub Kicinskieach direction: 131f42c104fSJakub Kicinski 132f42c104fSJakub Kicinski * crypto secrets (key, iv, salt) 133f42c104fSJakub Kicinski * crypto processing state (partial blocks, partial authentication tag, etc.) 134f42c104fSJakub Kicinski * record metadata (sequence number, processing offset and length) 135f42c104fSJakub Kicinski * expected TCP sequence number 136f42c104fSJakub Kicinski 137f42c104fSJakub KicinskiThere are no guarantees on record length or record segmentation. In particular 138f42c104fSJakub Kicinskisegments may start at any point of a record and contain any number of records. 139f42c104fSJakub KicinskiAssuming segments are received in order, the device should be able to perform 140f42c104fSJakub Kicinskicrypto operations and authentication regardless of segmentation. For this 141f42c104fSJakub Kicinskito be possible device has to keep small amount of segment-to-segment state. 142f42c104fSJakub KicinskiThis includes at least: 143f42c104fSJakub Kicinski 144f42c104fSJakub Kicinski * partial headers (if a segment carried only a part of the TLS header) 145f42c104fSJakub Kicinski * partial data block 146f42c104fSJakub Kicinski * partial authentication tag (all data had been seen but part of the 147f42c104fSJakub Kicinski authentication tag has to be written or read from the subsequent segment) 148f42c104fSJakub Kicinski 149f42c104fSJakub KicinskiRecord reassembly is not necessary for TLS offload. If the packets arrive 150f42c104fSJakub Kicinskiin order the device should be able to handle them separately and make 151f42c104fSJakub Kicinskiforward progress. 152f42c104fSJakub Kicinski 153f42c104fSJakub KicinskiTX 154f42c104fSJakub Kicinski-- 155f42c104fSJakub Kicinski 156f42c104fSJakub KicinskiThe kernel stack performs record framing reserving space for the authentication 157f42c104fSJakub Kicinskitag and populating all other TLS header and tailer fields. 158f42c104fSJakub Kicinski 159f42c104fSJakub KicinskiBoth the device and the driver maintain expected TCP sequence numbers 160f42c104fSJakub Kicinskidue to the possibility of retransmissions and the lack of software fallback 161f42c104fSJakub Kicinskionce the packet reaches the device. 162f42c104fSJakub KicinskiFor segments passed in order, the driver marks the packets with 163f42c104fSJakub Kicinskia connection identifier (note that a 5-tuple lookup is insufficient to identify 164f42c104fSJakub Kicinskipackets requiring HW offload, see the :ref:`5tuple_problems` section) 165f42c104fSJakub Kicinskiand hands them to the device. The device identifies the packet as requiring 166f42c104fSJakub KicinskiTLS handling and confirms the sequence number matches its expectation. 167f42c104fSJakub KicinskiThe device performs encryption and authentication of the record data. 168f42c104fSJakub KicinskiIt replaces the authentication tag and TCP checksum with correct values. 169f42c104fSJakub Kicinski 170f42c104fSJakub KicinskiRX 171f42c104fSJakub Kicinski-- 172f42c104fSJakub Kicinski 173f42c104fSJakub KicinskiBefore a packet is DMAed to the host (but after NIC's embedded switching 174f42c104fSJakub Kicinskiand packet transformation functions) the device validates the Layer 4 175f42c104fSJakub Kicinskichecksum and performs a 5-tuple lookup to find any TLS connection the packet 176f42c104fSJakub Kicinskimay belong to (technically a 4-tuple 177f42c104fSJakub Kicinskilookup is sufficient - IP addresses and TCP port numbers, as the protocol 178f42c104fSJakub Kicinskiis always TCP). If connection is matched device confirms if the TCP sequence 179f42c104fSJakub Kicinskinumber is the expected one and proceeds to TLS handling (record delineation, 180f42c104fSJakub Kicinskidecryption, authentication for each record in the packet). The device leaves 181f42c104fSJakub Kicinskithe record framing unmodified, the stack takes care of record decapsulation. 182f42c104fSJakub KicinskiDevice indicates successful handling of TLS offload in the per-packet context 183f42c104fSJakub Kicinski(descriptor) passed to the host. 184f42c104fSJakub Kicinski 185f42c104fSJakub KicinskiUpon reception of a TLS offloaded packet, the driver sets 186f42c104fSJakub Kicinskithe :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>` 187f42c104fSJakub Kicinskicorresponding to the segment. Networking stack makes sure decrypted 188f42c104fSJakub Kicinskiand non-decrypted segments do not get coalesced (e.g. by GRO or socket layer) 189f42c104fSJakub Kicinskiand takes care of partial decryption. 190f42c104fSJakub Kicinski 191f42c104fSJakub KicinskiResync handling 192f42c104fSJakub Kicinski=============== 193f42c104fSJakub Kicinski 194f42c104fSJakub KicinskiIn presence of packet drops or network packet reordering, the device may lose 195f42c104fSJakub Kicinskisynchronization with the TLS stream, and require a resync with the kernel's 196f42c104fSJakub KicinskiTCP stack. 197f42c104fSJakub Kicinski 198f42c104fSJakub KicinskiNote that resync is only attempted for connections which were successfully 199f42c104fSJakub Kicinskiadded to the device table and are in TLS_HW mode. For example, 200f42c104fSJakub Kicinskiif the table was full when cryptographic state was installed in the kernel, 201f42c104fSJakub Kicinskisuch connection will never get offloaded. Therefore the resync request 202f42c104fSJakub Kicinskidoes not carry any cryptographic connection state. 203f42c104fSJakub Kicinski 204f42c104fSJakub KicinskiTX 205f42c104fSJakub Kicinski-- 206f42c104fSJakub Kicinski 207f42c104fSJakub KicinskiSegments transmitted from an offloaded socket can get out of sync 208f42c104fSJakub Kicinskiin similar ways to the receive side-retransmissions - local drops 20950180074SJakub Kicinskiare possible, though network reorders are not. There are currently 21050180074SJakub Kicinskitwo mechanisms for dealing with out of order segments. 21150180074SJakub Kicinski 21250180074SJakub KicinskiCrypto state rebuilding 21350180074SJakub Kicinski~~~~~~~~~~~~~~~~~~~~~~~ 214f42c104fSJakub Kicinski 215f42c104fSJakub KicinskiWhenever an out of order segment is transmitted the driver provides 216f42c104fSJakub Kicinskithe device with enough information to perform cryptographic operations. 217f42c104fSJakub KicinskiThis means most likely that the part of the record preceding the current 218f42c104fSJakub Kicinskisegment has to be passed to the device as part of the packet context, 219f42c104fSJakub Kicinskitogether with its TCP sequence number and TLS record number. The device 220f42c104fSJakub Kicinskican then initialize its crypto state, process and discard the preceding 221f42c104fSJakub Kicinskidata (to be able to insert the authentication tag) and move onto handling 222f42c104fSJakub Kicinskithe actual packet. 223f42c104fSJakub Kicinski 224f42c104fSJakub KicinskiIn this mode depending on the implementation the driver can either ask 225f42c104fSJakub Kicinskifor a continuation with the crypto state and the new sequence number 226f42c104fSJakub Kicinski(next expected segment is the one after the out of order one), or continue 227f42c104fSJakub Kicinskiwith the previous stream state - assuming that the out of order segment 228f42c104fSJakub Kicinskiwas just a retransmission. The former is simpler, and does not require 229f42c104fSJakub Kicinskiretransmission detection therefore it is the recommended method until 230f42c104fSJakub Kicinskisuch time it is proven inefficient. 231f42c104fSJakub Kicinski 23250180074SJakub KicinskiNext record sync 23350180074SJakub Kicinski~~~~~~~~~~~~~~~~ 23450180074SJakub Kicinski 23550180074SJakub KicinskiWhenever an out of order segment is detected the driver requests 23650180074SJakub Kicinskithat the ``ktls`` software fallback code encrypt it. If the segment's 23750180074SJakub Kicinskisequence number is lower than expected the driver assumes retransmission 23850180074SJakub Kicinskiand doesn't change device state. If the segment is in the future, it 23950180074SJakub Kicinskimay imply a local drop, the driver asks the stack to sync the device 24050180074SJakub Kicinskito the next record state and falls back to software. 24150180074SJakub Kicinski 24250180074SJakub KicinskiResync request is indicated with: 24350180074SJakub Kicinski 24450180074SJakub Kicinski.. code-block:: c 24550180074SJakub Kicinski 24650180074SJakub Kicinski void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq) 24750180074SJakub Kicinski 24850180074SJakub KicinskiUntil resync is complete driver should not access its expected TCP 24950180074SJakub Kicinskisequence number (as it will be updated from a different context). 25050180074SJakub KicinskiFollowing helper should be used to test if resync is complete: 25150180074SJakub Kicinski 25250180074SJakub Kicinski.. code-block:: c 25350180074SJakub Kicinski 25450180074SJakub Kicinski bool tls_offload_tx_resync_pending(struct sock *sk) 25550180074SJakub Kicinski 25650180074SJakub KicinskiNext time ``ktls`` pushes a record it will first send its TCP sequence number 25750180074SJakub Kicinskiand TLS record number to the driver. Stack will also make sure that 25850180074SJakub Kicinskithe new record will start on a segment boundary (like it does when 25950180074SJakub Kicinskithe connection is initially added). 26050180074SJakub Kicinski 261f42c104fSJakub KicinskiRX 262f42c104fSJakub Kicinski-- 263f42c104fSJakub Kicinski 264f42c104fSJakub KicinskiA small amount of RX reorder events may not require a full resynchronization. 265f42c104fSJakub KicinskiIn particular the device should not lose synchronization 266f42c104fSJakub Kicinskiwhen record boundary can be recovered: 267f42c104fSJakub Kicinski 268f42c104fSJakub Kicinski.. kernel-figure:: tls-offload-reorder-good.svg 269f42c104fSJakub Kicinski :alt: reorder of non-header segment 270f42c104fSJakub Kicinski :align: center 271f42c104fSJakub Kicinski 272f42c104fSJakub Kicinski Reorder of non-header segment 273f42c104fSJakub Kicinski 274f42c104fSJakub KicinskiGreen segments are successfully decrypted, blue ones are passed 275f42c104fSJakub Kicinskias received on wire, red stripes mark start of new records. 276f42c104fSJakub Kicinski 277f42c104fSJakub KicinskiIn above case segment 1 is received and decrypted successfully. 278f42c104fSJakub KicinskiSegment 2 was dropped so 3 arrives out of order. The device knows 279f42c104fSJakub Kicinskithe next record starts inside 3, based on record length in segment 1. 280f42c104fSJakub KicinskiSegment 3 is passed untouched, because due to lack of data from segment 2 281f42c104fSJakub Kicinskithe remainder of the previous record inside segment 3 cannot be handled. 282f42c104fSJakub KicinskiThe device can, however, collect the authentication algorithm's state 283f42c104fSJakub Kicinskiand partial block from the new record in segment 3 and when 4 and 5 284f42c104fSJakub Kicinskiarrive continue decryption. Finally when 2 arrives it's completely outside 285f42c104fSJakub Kicinskiof expected window of the device so it's passed as is without special 286f42c104fSJakub Kicinskihandling. ``ktls`` software fallback handles the decryption of record 287f42c104fSJakub Kicinskispanning segments 1, 2 and 3. The device did not get out of sync, 288f42c104fSJakub Kicinskieven though two segments did not get decrypted. 289f42c104fSJakub Kicinski 290f42c104fSJakub KicinskiKernel synchronization may be necessary if the lost segment contained 291f42c104fSJakub Kicinskia record header and arrived after the next record header has already passed: 292f42c104fSJakub Kicinski 293f42c104fSJakub Kicinski.. kernel-figure:: tls-offload-reorder-bad.svg 294f42c104fSJakub Kicinski :alt: reorder of header segment 295f42c104fSJakub Kicinski :align: center 296f42c104fSJakub Kicinski 297f42c104fSJakub Kicinski Reorder of segment with a TLS header 298f42c104fSJakub Kicinski 299f42c104fSJakub KicinskiIn this example segment 2 gets dropped, and it contains a record header. 300f42c104fSJakub KicinskiDevice can only detect that segment 4 also contains a TLS header 301f42c104fSJakub Kicinskiif it knows the length of the previous record from segment 2. In this case 302f42c104fSJakub Kicinskithe device will lose synchronization with the stream. 303f42c104fSJakub Kicinski 304f953d33bSJakub KicinskiStream scan resynchronization 305f953d33bSJakub Kicinski~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 306f953d33bSJakub Kicinski 307f42c104fSJakub KicinskiWhen the device gets out of sync and the stream reaches TCP sequence 308f42c104fSJakub Kicinskinumbers more than a max size record past the expected TCP sequence number, 309f42c104fSJakub Kicinskithe device starts scanning for a known header pattern. For example 310f42c104fSJakub Kicinskifor TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur 311f42c104fSJakub Kicinskiin the SSL/TLS version field of the header. Once pattern is matched 312f42c104fSJakub Kicinskithe device continues attempting parsing headers at expected locations 313f42c104fSJakub Kicinski(based on the length fields at guessed locations). 314f42c104fSJakub KicinskiWhenever the expected location does not contain a valid header the scan 315f42c104fSJakub Kicinskiis restarted. 316f42c104fSJakub Kicinski 317f42c104fSJakub KicinskiWhen the header is matched the device sends a confirmation request 318f42c104fSJakub Kicinskito the kernel, asking if the guessed location is correct (if a TLS record 319f42c104fSJakub Kicinskireally starts there), and which record sequence number the given header had. 320f42c104fSJakub KicinskiThe kernel confirms the guessed location was correct and tells the device 321f42c104fSJakub Kicinskithe record sequence number. Meanwhile, the device had been parsing 322f42c104fSJakub Kicinskiand counting all records since the just-confirmed one, it adds the number 323f42c104fSJakub Kicinskiof records it had seen to the record number provided by the kernel. 324f42c104fSJakub KicinskiAt this point the device is in sync and can resume decryption at next 325f42c104fSJakub Kicinskisegment boundary. 326f42c104fSJakub Kicinski 327f42c104fSJakub KicinskiIn a pathological case the device may latch onto a sequence of matching 328f42c104fSJakub Kicinskiheaders and never hear back from the kernel (there is no negative 329f42c104fSJakub Kicinskiconfirmation from the kernel). The implementation may choose to periodically 330f42c104fSJakub Kicinskirestart scan. Given how unlikely falsely-matching stream is, however, 331f42c104fSJakub Kicinskiperiodic restart is not deemed necessary. 332f42c104fSJakub Kicinski 333f42c104fSJakub KicinskiSpecial care has to be taken if the confirmation request is passed 334f42c104fSJakub Kicinskiasynchronously to the packet stream and record may get processed 335f42c104fSJakub Kicinskiby the kernel before the confirmation request. 336f42c104fSJakub Kicinski 337f953d33bSJakub KicinskiStack-driven resynchronization 338f953d33bSJakub Kicinski~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 339f953d33bSJakub Kicinski 340f953d33bSJakub KicinskiThe driver may also request the stack to perform resynchronization 341f953d33bSJakub Kicinskiwhenever it sees the records are no longer getting decrypted. 342f953d33bSJakub KicinskiIf the connection is configured in this mode the stack automatically 343f953d33bSJakub Kicinskischedules resynchronization after it has received two completely encrypted 344f953d33bSJakub Kicinskirecords. 345f953d33bSJakub Kicinski 346f953d33bSJakub KicinskiThe stack waits for the socket to drain and informs the device about 347f953d33bSJakub Kicinskithe next expected record number and its TCP sequence number. If the 348f953d33bSJakub Kicinskirecords continue to be received fully encrypted stack retries the 349f953d33bSJakub Kicinskisynchronization with an exponential back off (first after 2 encrypted 350f953d33bSJakub Kicinskirecords, then after 4 records, after 8, after 16... up until every 351f953d33bSJakub Kicinski128 records). 352f953d33bSJakub Kicinski 353f42c104fSJakub KicinskiError handling 354f42c104fSJakub Kicinski============== 355f42c104fSJakub Kicinski 356f42c104fSJakub KicinskiTX 357f42c104fSJakub Kicinski-- 358f42c104fSJakub Kicinski 359f42c104fSJakub KicinskiPackets may be redirected or rerouted by the stack to a different 360f42c104fSJakub Kicinskidevice than the selected TLS offload device. The stack will handle 361f42c104fSJakub Kicinskisuch condition using the :c:func:`sk_validate_xmit_skb` helper 362f42c104fSJakub Kicinski(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook). 363f42c104fSJakub KicinskiOffload maintains information about all records until the data is 364f42c104fSJakub Kicinskifully acknowledged, so if skbs reach the wrong device they can be handled 365f42c104fSJakub Kicinskiby software fallback. 366f42c104fSJakub Kicinski 367f42c104fSJakub KicinskiAny device TLS offload handling error on the transmission side must result 368f42c104fSJakub Kicinskiin the packet being dropped. For example if a packet got out of order 369f42c104fSJakub Kicinskidue to a bug in the stack or the device, reached the device and can't 370f42c104fSJakub Kicinskibe encrypted such packet must be dropped. 371f42c104fSJakub Kicinski 372f42c104fSJakub KicinskiRX 373f42c104fSJakub Kicinski-- 374f42c104fSJakub Kicinski 375f42c104fSJakub KicinskiIf the device encounters any problems with TLS offload on the receive 376f42c104fSJakub Kicinskiside it should pass the packet to the host's networking stack as it was 377f42c104fSJakub Kicinskireceived on the wire. 378f42c104fSJakub Kicinski 379f42c104fSJakub KicinskiFor example authentication failure for any record in the segment should 380f42c104fSJakub Kicinskiresult in passing the unmodified packet to the software fallback. This means 381f42c104fSJakub Kicinskipackets should not be modified "in place". Splitting segments to handle partial 382f42c104fSJakub Kicinskidecryption is not advised. In other words either all records in the packet 383f42c104fSJakub Kicinskihad been handled successfully and authenticated or the packet has to be passed 384f42c104fSJakub Kicinskito the host's stack as it was on the wire (recovering original packet in the 385f42c104fSJakub Kicinskidriver if device provides precise error is sufficient). 386f42c104fSJakub Kicinski 387f42c104fSJakub KicinskiThe Linux networking stack does not provide a way of reporting per-packet 388f42c104fSJakub Kicinskidecryption and authentication errors, packets with errors must simply not 389f42c104fSJakub Kicinskihave the :c:member:`decrypted` mark set. 390f42c104fSJakub Kicinski 391f42c104fSJakub KicinskiA packet should also not be handled by the TLS offload if it contains 392f42c104fSJakub Kicinskiincorrect checksums. 393f42c104fSJakub Kicinski 394f42c104fSJakub KicinskiPerformance metrics 395f42c104fSJakub Kicinski=================== 396f42c104fSJakub Kicinski 397f42c104fSJakub KicinskiTLS offload can be characterized by the following basic metrics: 398f42c104fSJakub Kicinski 399f42c104fSJakub Kicinski * max connection count 400f42c104fSJakub Kicinski * connection installation rate 401f42c104fSJakub Kicinski * connection installation latency 402f42c104fSJakub Kicinski * total cryptographic performance 403f42c104fSJakub Kicinski 404f42c104fSJakub KicinskiNote that each TCP connection requires a TLS session in both directions, 405f42c104fSJakub Kicinskithe performance may be reported treating each direction separately. 406f42c104fSJakub Kicinski 407f42c104fSJakub KicinskiMax connection count 408f42c104fSJakub Kicinski-------------------- 409f42c104fSJakub Kicinski 410f42c104fSJakub KicinskiThe number of connections device can support can be exposed via 411f42c104fSJakub Kicinski``devlink resource`` API. 412f42c104fSJakub Kicinski 413f42c104fSJakub KicinskiTotal cryptographic performance 414f42c104fSJakub Kicinski------------------------------- 415f42c104fSJakub Kicinski 416f42c104fSJakub KicinskiOffload performance may depend on segment and record size. 417f42c104fSJakub Kicinski 418f42c104fSJakub KicinskiOverload of the cryptographic subsystem of the device should not have 419f42c104fSJakub Kicinskisignificant performance impact on non-offloaded streams. 420f42c104fSJakub Kicinski 421f42c104fSJakub KicinskiStatistics 422f42c104fSJakub Kicinski========== 423f42c104fSJakub Kicinski 424f42c104fSJakub KicinskiFollowing minimum set of TLS-related statistics should be reported 425f42c104fSJakub Kicinskiby the driver: 426f42c104fSJakub Kicinski 427280c0899STariq Toukan * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets 428280c0899STariq Toukan which were part of a TLS stream. 429280c0899STariq Toukan * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets 430280c0899STariq Toukan which were successfully decrypted. 43176c1e1acSTariq Toukan * ``rx_tls_ctx`` - number of TLS RX HW offload contexts added to device for 43276c1e1acSTariq Toukan decryption. 43376c1e1acSTariq Toukan * ``rx_tls_del`` - number of TLS RX HW offload contexts deleted from device 43476c1e1acSTariq Toukan (connection has finished). 43576c1e1acSTariq Toukan * ``rx_tls_resync_req_pkt`` - number of received TLS packets with a resync 43676c1e1acSTariq Toukan request. 43776c1e1acSTariq Toukan * ``rx_tls_resync_req_start`` - number of times the TLS async resync request 43876c1e1acSTariq Toukan was started. 43976c1e1acSTariq Toukan * ``rx_tls_resync_req_end`` - number of times the TLS async resync request 44076c1e1acSTariq Toukan properly ended with providing the HW tracked tcp-seq. 44176c1e1acSTariq Toukan * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request 44276c1e1acSTariq Toukan procedure was started by not properly ended. 44376c1e1acSTariq Toukan * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to 44476c1e1acSTariq Toukan the driver was successfully handled. 44576c1e1acSTariq Toukan * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to 44676c1e1acSTariq Toukan the driver was terminated unsuccessfully. 44776c1e1acSTariq Toukan * ``rx_tls_err`` - number of RX packets which were part of a TLS stream 44876c1e1acSTariq Toukan but were not decrypted due to unexpected error in the state machine. 449280c0899STariq Toukan * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device 450280c0899STariq Toukan for encryption of their TLS payload. 451280c0899STariq Toukan * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets 452280c0899STariq Toukan passed to the device for encryption. 453280c0899STariq Toukan * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for 454280c0899STariq Toukan encryption. 455f42c104fSJakub Kicinski * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream 456280c0899STariq Toukan but did not arrive in the expected order. 4572836654aSTariq Toukan * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of 4582836654aSTariq Toukan a TLS stream and arrived out-of-order, but skipped the HW offload routine 4592836654aSTariq Toukan and went to the regular transmit flow as they were retransmissions of the 4602836654aSTariq Toukan connection handshake. 461280c0899STariq Toukan * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of 462280c0899STariq Toukan a TLS stream dropped, because they arrived out of order and associated 463280c0899STariq Toukan record could not be found. 464280c0899STariq Toukan * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS 465280c0899STariq Toukan stream dropped, because they contain both data that has been encrypted by 466280c0899STariq Toukan software and data that expects hardware crypto offload. 467f42c104fSJakub Kicinski 468f42c104fSJakub KicinskiNotable corner cases, exceptions and additional requirements 469f42c104fSJakub Kicinski============================================================ 470f42c104fSJakub Kicinski 471f42c104fSJakub Kicinski.. _5tuple_problems: 472f42c104fSJakub Kicinski 473f42c104fSJakub Kicinski5-tuple matching limitations 474f42c104fSJakub Kicinski---------------------------- 475f42c104fSJakub Kicinski 476f42c104fSJakub KicinskiThe device can only recognize received packets based on the 5-tuple 477f42c104fSJakub Kicinskiof the socket. Current ``ktls`` implementation will not offload sockets 478f42c104fSJakub Kicinskirouted through software interfaces such as those used for tunneling 479f42c104fSJakub Kicinskior virtual networking. However, many packet transformations performed 480f42c104fSJakub Kicinskiby the networking stack (most notably any BPF logic) do not require 481f42c104fSJakub Kicinskiany intermediate software device, therefore a 5-tuple match may 482f42c104fSJakub Kicinskiconsistently miss at the device level. In such cases the device 483f42c104fSJakub Kicinskishould still be able to perform TX offload (encryption) and should 484f42c104fSJakub Kicinskifallback cleanly to software decryption (RX). 485f42c104fSJakub Kicinski 486f42c104fSJakub KicinskiOut of order 487f42c104fSJakub Kicinski------------ 488f42c104fSJakub Kicinski 489f42c104fSJakub KicinskiIntroducing extra processing in NICs should not cause packets to be 490f42c104fSJakub Kicinskitransmitted or received out of order, for example pure ACK packets 491f42c104fSJakub Kicinskishould not be reordered with respect to data segments. 492f42c104fSJakub Kicinski 493f42c104fSJakub KicinskiIngress reorder 494f42c104fSJakub Kicinski--------------- 495f42c104fSJakub Kicinski 496f42c104fSJakub KicinskiA device is permitted to perform packet reordering for consecutive 497f42c104fSJakub KicinskiTCP segments (i.e. placing packets in the correct order) but any form 498f42c104fSJakub Kicinskiof additional buffering is disallowed. 499f42c104fSJakub Kicinski 500f42c104fSJakub KicinskiCoexistence with standard networking offload features 501f42c104fSJakub Kicinski----------------------------------------------------- 502f42c104fSJakub Kicinski 503f42c104fSJakub KicinskiOffloaded ``ktls`` sockets should support standard TCP stack features 504f42c104fSJakub Kicinskitransparently. Enabling device TLS offload should not cause any difference 505f42c104fSJakub Kicinskiin packets as seen on the wire. 506f42c104fSJakub Kicinski 507f42c104fSJakub KicinskiTransport layer transparency 508f42c104fSJakub Kicinski---------------------------- 509f42c104fSJakub Kicinski 510f42c104fSJakub KicinskiThe device should not modify any packet headers for the purpose 511f42c104fSJakub Kicinskiof the simplifying TLS offload. 512f42c104fSJakub Kicinski 513f42c104fSJakub KicinskiThe device should not depend on any packet headers beyond what is strictly 514f42c104fSJakub Kicinskinecessary for TLS offload. 515f42c104fSJakub Kicinski 516f42c104fSJakub KicinskiSegment drops 517f42c104fSJakub Kicinski------------- 518f42c104fSJakub Kicinski 519f42c104fSJakub KicinskiDropping packets is acceptable only in the event of catastrophic 520f42c104fSJakub Kicinskisystem errors and should never be used as an error handling mechanism 521f42c104fSJakub Kicinskiin cases arising from normal operation. In other words, reliance 522f42c104fSJakub Kicinskion TCP retransmissions to handle corner cases is not acceptable. 523f42c104fSJakub Kicinski 524f42c104fSJakub KicinskiTLS device features 525f42c104fSJakub Kicinski------------------- 526f42c104fSJakub Kicinski 527ae0b04b2STariq ToukanDrivers should ignore the changes to the TLS device feature flags. 528f42c104fSJakub KicinskiThese flags will be acted upon accordingly by the core ``ktls`` code. 529f42c104fSJakub KicinskiTLS device feature flags only control adding of new TLS connection 530f42c104fSJakub Kicinskioffloads, old connections will remain active after flags are cleared. 531ae0b04b2STariq Toukan 532ae0b04b2STariq ToukanTLS encryption cannot be offloaded to devices without checksum calculation 53325537d71STariq Toukanoffload. Hence, TLS TX device feature flag requires TX csum offload being set. 534ae0b04b2STariq ToukanDisabling the latter implies clearing the former. Disabling TX checksum offload 535ae0b04b2STariq Toukanshould not affect old connections, and drivers should make sure checksum 536ae0b04b2STariq Toukancalculation does not break for them. 537*a3eb4e9dSTariq ToukanSimilarly, device-offloaded TLS decryption implies doing RXCSUM. If the user 538*a3eb4e9dSTariq Toukandoes not want to enable RX csum offload, TLS RX device feature is disabled 539*a3eb4e9dSTariq Toukanas well. 540