xref: /openbmc/linux/Documentation/networking/tls-offload.rst (revision f8bade6c9a6213c2c5ba6e5bf32415ecab6e41e5)
1f42c104fSJakub Kicinski.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2f42c104fSJakub Kicinski
3f42c104fSJakub Kicinski==================
4f42c104fSJakub KicinskiKernel TLS offload
5f42c104fSJakub Kicinski==================
6f42c104fSJakub Kicinski
7f42c104fSJakub KicinskiKernel TLS operation
8f42c104fSJakub Kicinski====================
9f42c104fSJakub Kicinski
10f42c104fSJakub KicinskiLinux kernel provides TLS connection offload infrastructure. Once a TCP
11f42c104fSJakub Kicinskiconnection is in ``ESTABLISHED`` state user space can enable the TLS Upper
12f42c104fSJakub KicinskiLayer Protocol (ULP) and install the cryptographic connection state.
13f42c104fSJakub KicinskiFor details regarding the user-facing interface refer to the TLS
14f42c104fSJakub Kicinskidocumentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
15f42c104fSJakub Kicinski
16f42c104fSJakub Kicinski``ktls`` can operate in three modes:
17f42c104fSJakub Kicinski
18f42c104fSJakub Kicinski * Software crypto mode (``TLS_SW``) - CPU handles the cryptography.
19f42c104fSJakub Kicinski   In most basic cases only crypto operations synchronous with the CPU
20f42c104fSJakub Kicinski   can be used, but depending on calling context CPU may utilize
21f42c104fSJakub Kicinski   asynchronous crypto accelerators. The use of accelerators introduces extra
22f42c104fSJakub Kicinski   latency on socket reads (decryption only starts when a read syscall
23f42c104fSJakub Kicinski   is made) and additional I/O load on the system.
24f42c104fSJakub Kicinski * Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto
25f42c104fSJakub Kicinski   on a packet by packet basis, provided the packets arrive in order.
26f42c104fSJakub Kicinski   This mode integrates best with the kernel stack and is described in detail
27f42c104fSJakub Kicinski   in the remaining part of this document
28f42c104fSJakub Kicinski   (``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``).
29f42c104fSJakub Kicinski * Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where
30f42c104fSJakub Kicinski   NIC driver and firmware replace the kernel networking stack
31f42c104fSJakub Kicinski   with its own TCP handling, it is not usable in production environments
32f42c104fSJakub Kicinski   making use of the Linux networking stack for example any firewalling
33f42c104fSJakub Kicinski   abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``).
34f42c104fSJakub Kicinski
35f42c104fSJakub KicinskiThe operation mode is selected automatically based on device configuration,
36f42c104fSJakub Kicinskioffload opt-in or opt-out on per-connection basis is not currently supported.
37f42c104fSJakub Kicinski
38f42c104fSJakub KicinskiTX
39f42c104fSJakub Kicinski--
40f42c104fSJakub Kicinski
41f42c104fSJakub KicinskiAt a high level user write requests are turned into a scatter list, the TLS ULP
42f42c104fSJakub Kicinskiintercepts them, inserts record framing, performs encryption (in ``TLS_SW``
43f42c104fSJakub Kicinskimode) and then hands the modified scatter list to the TCP layer. From this
44f42c104fSJakub Kicinskipoint on the TCP stack proceeds as normal.
45f42c104fSJakub Kicinski
46f42c104fSJakub KicinskiIn ``TLS_HW`` mode the encryption is not performed in the TLS ULP.
47f42c104fSJakub KicinskiInstead packets reach a device driver, the driver will mark the packets
48f42c104fSJakub Kicinskifor crypto offload based on the socket the packet is attached to,
49f42c104fSJakub Kicinskiand send them to the device for encryption and transmission.
50f42c104fSJakub Kicinski
51f42c104fSJakub KicinskiRX
52f42c104fSJakub Kicinski--
53f42c104fSJakub Kicinski
54f42c104fSJakub KicinskiOn the receive side if the device handled decryption and authentication
55f42c104fSJakub Kicinskisuccessfully, the driver will set the decrypted bit in the associated
56f42c104fSJakub Kicinski:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and
57f42c104fSJakub Kicinskiare handled normally. ``ktls`` is informed when data is queued to the socket
58f42c104fSJakub Kicinskiand the ``strparser`` mechanism is used to delineate the records. Upon read
59f42c104fSJakub Kicinskirequest, records are retrieved from the socket and passed to decryption routine.
60f42c104fSJakub KicinskiIf device decrypted all the segments of the record the decryption is skipped,
61f42c104fSJakub Kicinskiotherwise software path handles decryption.
62f42c104fSJakub Kicinski
63f42c104fSJakub Kicinski.. kernel-figure::  tls-offload-layers.svg
64f42c104fSJakub Kicinski   :alt:	TLS offload layers
65f42c104fSJakub Kicinski   :align:	center
66f42c104fSJakub Kicinski   :figwidth:	28em
67f42c104fSJakub Kicinski
68f42c104fSJakub Kicinski   Layers of Kernel TLS stack
69f42c104fSJakub Kicinski
70f42c104fSJakub KicinskiDevice configuration
71f42c104fSJakub Kicinski====================
72f42c104fSJakub Kicinski
73f42c104fSJakub KicinskiDuring driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and
74f42c104fSJakub Kicinski``NETIF_F_HW_TLS_TX`` features and installs its
75f42c104fSJakub Kicinski:c:type:`struct tlsdev_ops <tlsdev_ops>`
76f42c104fSJakub Kicinskipointer in the :c:member:`tlsdev_ops` member of the
77f42c104fSJakub Kicinski:c:type:`struct net_device <net_device>`.
78f42c104fSJakub Kicinski
79f42c104fSJakub KicinskiWhen TLS cryptographic connection state is installed on a ``ktls`` socket
80f42c104fSJakub Kicinski(note that it is done twice, once for RX and once for TX direction,
81f42c104fSJakub Kicinskiand the two are completely independent), the kernel checks if the underlying
82f42c104fSJakub Kicinskinetwork device is offload-capable and attempts the offload. In case offload
83f42c104fSJakub Kicinskifails the connection is handled entirely in software using the same mechanism
84f42c104fSJakub Kicinskias if the offload was never tried.
85f42c104fSJakub Kicinski
86f42c104fSJakub KicinskiOffload request is performed via the :c:member:`tls_dev_add` callback of
87f42c104fSJakub Kicinski:c:type:`struct tlsdev_ops <tlsdev_ops>`:
88f42c104fSJakub Kicinski
89f42c104fSJakub Kicinski.. code-block:: c
90f42c104fSJakub Kicinski
91f42c104fSJakub Kicinski	int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
92f42c104fSJakub Kicinski			   enum tls_offload_ctx_dir direction,
93f42c104fSJakub Kicinski			   struct tls_crypto_info *crypto_info,
94f42c104fSJakub Kicinski			   u32 start_offload_tcp_sn);
95f42c104fSJakub Kicinski
96f42c104fSJakub Kicinski``direction`` indicates whether the cryptographic information is for
97f42c104fSJakub Kicinskithe received or transmitted packets. Driver uses the ``sk`` parameter
98f42c104fSJakub Kicinskito retrieve the connection 5-tuple and socket family (IPv4 vs IPv6).
99f42c104fSJakub KicinskiCryptographic information in ``crypto_info`` includes the key, iv, salt
100f42c104fSJakub Kicinskias well as TLS record sequence number. ``start_offload_tcp_sn`` indicates
101f42c104fSJakub Kicinskiwhich TCP sequence number corresponds to the beginning of the record with
102f42c104fSJakub Kicinskisequence number from ``crypto_info``. The driver can add its state
103f42c104fSJakub Kicinskiat the end of kernel structures (see :c:member:`driver_state` members
104f42c104fSJakub Kicinskiin ``include/net/tls.h``) to avoid additional allocations and pointer
105f42c104fSJakub Kicinskidereferences.
106f42c104fSJakub Kicinski
107f42c104fSJakub KicinskiTX
108f42c104fSJakub Kicinski--
109f42c104fSJakub Kicinski
110f42c104fSJakub KicinskiAfter TX state is installed, the stack guarantees that the first segment
111f42c104fSJakub Kicinskiof the stream will start exactly at the ``start_offload_tcp_sn`` sequence
112f42c104fSJakub Kicinskinumber, simplifying TCP sequence number matching.
113f42c104fSJakub Kicinski
114f42c104fSJakub KicinskiTX offload being fully initialized does not imply that all segments passing
115f42c104fSJakub Kicinskithrough the driver and which belong to the offloaded socket will be after
116f42c104fSJakub Kicinskithe expected sequence number and will have kernel record information.
117f42c104fSJakub KicinskiIn particular, already encrypted data may have been queued to the socket
118f42c104fSJakub Kicinskibefore installing the connection state in the kernel.
119f42c104fSJakub Kicinski
120f42c104fSJakub KicinskiRX
121f42c104fSJakub Kicinski--
122f42c104fSJakub Kicinski
123f42c104fSJakub KicinskiIn RX direction local networking stack has little control over the segmentation,
124f42c104fSJakub Kicinskiso the initial records' TCP sequence number may be anywhere inside the segment.
125f42c104fSJakub Kicinski
126f42c104fSJakub KicinskiNormal operation
127f42c104fSJakub Kicinski================
128f42c104fSJakub Kicinski
129f42c104fSJakub KicinskiAt the minimum the device maintains the following state for each connection, in
130f42c104fSJakub Kicinskieach direction:
131f42c104fSJakub Kicinski
132f42c104fSJakub Kicinski * crypto secrets (key, iv, salt)
133f42c104fSJakub Kicinski * crypto processing state (partial blocks, partial authentication tag, etc.)
134f42c104fSJakub Kicinski * record metadata (sequence number, processing offset and length)
135f42c104fSJakub Kicinski * expected TCP sequence number
136f42c104fSJakub Kicinski
137f42c104fSJakub KicinskiThere are no guarantees on record length or record segmentation. In particular
138f42c104fSJakub Kicinskisegments may start at any point of a record and contain any number of records.
139f42c104fSJakub KicinskiAssuming segments are received in order, the device should be able to perform
140f42c104fSJakub Kicinskicrypto operations and authentication regardless of segmentation. For this
141f42c104fSJakub Kicinskito be possible device has to keep small amount of segment-to-segment state.
142f42c104fSJakub KicinskiThis includes at least:
143f42c104fSJakub Kicinski
144f42c104fSJakub Kicinski * partial headers (if a segment carried only a part of the TLS header)
145f42c104fSJakub Kicinski * partial data block
146f42c104fSJakub Kicinski * partial authentication tag (all data had been seen but part of the
147f42c104fSJakub Kicinski   authentication tag has to be written or read from the subsequent segment)
148f42c104fSJakub Kicinski
149f42c104fSJakub KicinskiRecord reassembly is not necessary for TLS offload. If the packets arrive
150f42c104fSJakub Kicinskiin order the device should be able to handle them separately and make
151f42c104fSJakub Kicinskiforward progress.
152f42c104fSJakub Kicinski
153f42c104fSJakub KicinskiTX
154f42c104fSJakub Kicinski--
155f42c104fSJakub Kicinski
156f42c104fSJakub KicinskiThe kernel stack performs record framing reserving space for the authentication
157f42c104fSJakub Kicinskitag and populating all other TLS header and tailer fields.
158f42c104fSJakub Kicinski
159f42c104fSJakub KicinskiBoth the device and the driver maintain expected TCP sequence numbers
160f42c104fSJakub Kicinskidue to the possibility of retransmissions and the lack of software fallback
161f42c104fSJakub Kicinskionce the packet reaches the device.
162f42c104fSJakub KicinskiFor segments passed in order, the driver marks the packets with
163f42c104fSJakub Kicinskia connection identifier (note that a 5-tuple lookup is insufficient to identify
164f42c104fSJakub Kicinskipackets requiring HW offload, see the :ref:`5tuple_problems` section)
165f42c104fSJakub Kicinskiand hands them to the device. The device identifies the packet as requiring
166f42c104fSJakub KicinskiTLS handling and confirms the sequence number matches its expectation.
167f42c104fSJakub KicinskiThe device performs encryption and authentication of the record data.
168f42c104fSJakub KicinskiIt replaces the authentication tag and TCP checksum with correct values.
169f42c104fSJakub Kicinski
170f42c104fSJakub KicinskiRX
171f42c104fSJakub Kicinski--
172f42c104fSJakub Kicinski
173f42c104fSJakub KicinskiBefore a packet is DMAed to the host (but after NIC's embedded switching
174f42c104fSJakub Kicinskiand packet transformation functions) the device validates the Layer 4
175f42c104fSJakub Kicinskichecksum and performs a 5-tuple lookup to find any TLS connection the packet
176f42c104fSJakub Kicinskimay belong to (technically a 4-tuple
177f42c104fSJakub Kicinskilookup is sufficient - IP addresses and TCP port numbers, as the protocol
178f42c104fSJakub Kicinskiis always TCP). If connection is matched device confirms if the TCP sequence
179f42c104fSJakub Kicinskinumber is the expected one and proceeds to TLS handling (record delineation,
180f42c104fSJakub Kicinskidecryption, authentication for each record in the packet). The device leaves
181f42c104fSJakub Kicinskithe record framing unmodified, the stack takes care of record decapsulation.
182f42c104fSJakub KicinskiDevice indicates successful handling of TLS offload in the per-packet context
183f42c104fSJakub Kicinski(descriptor) passed to the host.
184f42c104fSJakub Kicinski
185f42c104fSJakub KicinskiUpon reception of a TLS offloaded packet, the driver sets
186f42c104fSJakub Kicinskithe :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>`
187f42c104fSJakub Kicinskicorresponding to the segment. Networking stack makes sure decrypted
188f42c104fSJakub Kicinskiand non-decrypted segments do not get coalesced (e.g. by GRO or socket layer)
189f42c104fSJakub Kicinskiand takes care of partial decryption.
190f42c104fSJakub Kicinski
191f42c104fSJakub KicinskiResync handling
192f42c104fSJakub Kicinski===============
193f42c104fSJakub Kicinski
194f42c104fSJakub KicinskiIn presence of packet drops or network packet reordering, the device may lose
195f42c104fSJakub Kicinskisynchronization with the TLS stream, and require a resync with the kernel's
196f42c104fSJakub KicinskiTCP stack.
197f42c104fSJakub Kicinski
198f42c104fSJakub KicinskiNote that resync is only attempted for connections which were successfully
199f42c104fSJakub Kicinskiadded to the device table and are in TLS_HW mode. For example,
200f42c104fSJakub Kicinskiif the table was full when cryptographic state was installed in the kernel,
201f42c104fSJakub Kicinskisuch connection will never get offloaded. Therefore the resync request
202f42c104fSJakub Kicinskidoes not carry any cryptographic connection state.
203f42c104fSJakub Kicinski
204f42c104fSJakub KicinskiTX
205f42c104fSJakub Kicinski--
206f42c104fSJakub Kicinski
207f42c104fSJakub KicinskiSegments transmitted from an offloaded socket can get out of sync
208f42c104fSJakub Kicinskiin similar ways to the receive side-retransmissions - local drops
20950180074SJakub Kicinskiare possible, though network reorders are not. There are currently
21050180074SJakub Kicinskitwo mechanisms for dealing with out of order segments.
21150180074SJakub Kicinski
21250180074SJakub KicinskiCrypto state rebuilding
21350180074SJakub Kicinski~~~~~~~~~~~~~~~~~~~~~~~
214f42c104fSJakub Kicinski
215f42c104fSJakub KicinskiWhenever an out of order segment is transmitted the driver provides
216f42c104fSJakub Kicinskithe device with enough information to perform cryptographic operations.
217f42c104fSJakub KicinskiThis means most likely that the part of the record preceding the current
218f42c104fSJakub Kicinskisegment has to be passed to the device as part of the packet context,
219f42c104fSJakub Kicinskitogether with its TCP sequence number and TLS record number. The device
220f42c104fSJakub Kicinskican then initialize its crypto state, process and discard the preceding
221f42c104fSJakub Kicinskidata (to be able to insert the authentication tag) and move onto handling
222f42c104fSJakub Kicinskithe actual packet.
223f42c104fSJakub Kicinski
224f42c104fSJakub KicinskiIn this mode depending on the implementation the driver can either ask
225f42c104fSJakub Kicinskifor a continuation with the crypto state and the new sequence number
226f42c104fSJakub Kicinski(next expected segment is the one after the out of order one), or continue
227f42c104fSJakub Kicinskiwith the previous stream state - assuming that the out of order segment
228f42c104fSJakub Kicinskiwas just a retransmission. The former is simpler, and does not require
229f42c104fSJakub Kicinskiretransmission detection therefore it is the recommended method until
230f42c104fSJakub Kicinskisuch time it is proven inefficient.
231f42c104fSJakub Kicinski
23250180074SJakub KicinskiNext record sync
23350180074SJakub Kicinski~~~~~~~~~~~~~~~~
23450180074SJakub Kicinski
23550180074SJakub KicinskiWhenever an out of order segment is detected the driver requests
23650180074SJakub Kicinskithat the ``ktls`` software fallback code encrypt it. If the segment's
23750180074SJakub Kicinskisequence number is lower than expected the driver assumes retransmission
23850180074SJakub Kicinskiand doesn't change device state. If the segment is in the future, it
23950180074SJakub Kicinskimay imply a local drop, the driver asks the stack to sync the device
24050180074SJakub Kicinskito the next record state and falls back to software.
24150180074SJakub Kicinski
24250180074SJakub KicinskiResync request is indicated with:
24350180074SJakub Kicinski
24450180074SJakub Kicinski.. code-block:: c
24550180074SJakub Kicinski
24650180074SJakub Kicinski  void tls_offload_tx_resync_request(struct sock *sk, u32 got_seq, u32 exp_seq)
24750180074SJakub Kicinski
24850180074SJakub KicinskiUntil resync is complete driver should not access its expected TCP
24950180074SJakub Kicinskisequence number (as it will be updated from a different context).
25050180074SJakub KicinskiFollowing helper should be used to test if resync is complete:
25150180074SJakub Kicinski
25250180074SJakub Kicinski.. code-block:: c
25350180074SJakub Kicinski
25450180074SJakub Kicinski  bool tls_offload_tx_resync_pending(struct sock *sk)
25550180074SJakub Kicinski
25650180074SJakub KicinskiNext time ``ktls`` pushes a record it will first send its TCP sequence number
25750180074SJakub Kicinskiand TLS record number to the driver. Stack will also make sure that
25850180074SJakub Kicinskithe new record will start on a segment boundary (like it does when
25950180074SJakub Kicinskithe connection is initially added).
26050180074SJakub Kicinski
261f42c104fSJakub KicinskiRX
262f42c104fSJakub Kicinski--
263f42c104fSJakub Kicinski
264f42c104fSJakub KicinskiA small amount of RX reorder events may not require a full resynchronization.
265f42c104fSJakub KicinskiIn particular the device should not lose synchronization
266f42c104fSJakub Kicinskiwhen record boundary can be recovered:
267f42c104fSJakub Kicinski
268f42c104fSJakub Kicinski.. kernel-figure::  tls-offload-reorder-good.svg
269f42c104fSJakub Kicinski   :alt:	reorder of non-header segment
270f42c104fSJakub Kicinski   :align:	center
271f42c104fSJakub Kicinski
272f42c104fSJakub Kicinski   Reorder of non-header segment
273f42c104fSJakub Kicinski
274f42c104fSJakub KicinskiGreen segments are successfully decrypted, blue ones are passed
275f42c104fSJakub Kicinskias received on wire, red stripes mark start of new records.
276f42c104fSJakub Kicinski
277f42c104fSJakub KicinskiIn above case segment 1 is received and decrypted successfully.
278f42c104fSJakub KicinskiSegment 2 was dropped so 3 arrives out of order. The device knows
279f42c104fSJakub Kicinskithe next record starts inside 3, based on record length in segment 1.
280f42c104fSJakub KicinskiSegment 3 is passed untouched, because due to lack of data from segment 2
281f42c104fSJakub Kicinskithe remainder of the previous record inside segment 3 cannot be handled.
282f42c104fSJakub KicinskiThe device can, however, collect the authentication algorithm's state
283f42c104fSJakub Kicinskiand partial block from the new record in segment 3 and when 4 and 5
284f42c104fSJakub Kicinskiarrive continue decryption. Finally when 2 arrives it's completely outside
285f42c104fSJakub Kicinskiof expected window of the device so it's passed as is without special
286f42c104fSJakub Kicinskihandling. ``ktls`` software fallback handles the decryption of record
287f42c104fSJakub Kicinskispanning segments 1, 2 and 3. The device did not get out of sync,
288f42c104fSJakub Kicinskieven though two segments did not get decrypted.
289f42c104fSJakub Kicinski
290f42c104fSJakub KicinskiKernel synchronization may be necessary if the lost segment contained
291f42c104fSJakub Kicinskia record header and arrived after the next record header has already passed:
292f42c104fSJakub Kicinski
293f42c104fSJakub Kicinski.. kernel-figure::  tls-offload-reorder-bad.svg
294f42c104fSJakub Kicinski   :alt:	reorder of header segment
295f42c104fSJakub Kicinski   :align:	center
296f42c104fSJakub Kicinski
297f42c104fSJakub Kicinski   Reorder of segment with a TLS header
298f42c104fSJakub Kicinski
299f42c104fSJakub KicinskiIn this example segment 2 gets dropped, and it contains a record header.
300f42c104fSJakub KicinskiDevice can only detect that segment 4 also contains a TLS header
301f42c104fSJakub Kicinskiif it knows the length of the previous record from segment 2. In this case
302f42c104fSJakub Kicinskithe device will lose synchronization with the stream.
303f42c104fSJakub Kicinski
304f953d33bSJakub KicinskiStream scan resynchronization
305f953d33bSJakub Kicinski~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
306f953d33bSJakub Kicinski
307f42c104fSJakub KicinskiWhen the device gets out of sync and the stream reaches TCP sequence
308f42c104fSJakub Kicinskinumbers more than a max size record past the expected TCP sequence number,
309f42c104fSJakub Kicinskithe device starts scanning for a known header pattern. For example
310f42c104fSJakub Kicinskifor TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur
311f42c104fSJakub Kicinskiin the SSL/TLS version field of the header. Once pattern is matched
312f42c104fSJakub Kicinskithe device continues attempting parsing headers at expected locations
313f42c104fSJakub Kicinski(based on the length fields at guessed locations).
314f42c104fSJakub KicinskiWhenever the expected location does not contain a valid header the scan
315f42c104fSJakub Kicinskiis restarted.
316f42c104fSJakub Kicinski
317f42c104fSJakub KicinskiWhen the header is matched the device sends a confirmation request
318f42c104fSJakub Kicinskito the kernel, asking if the guessed location is correct (if a TLS record
319f42c104fSJakub Kicinskireally starts there), and which record sequence number the given header had.
320f42c104fSJakub KicinskiThe kernel confirms the guessed location was correct and tells the device
321f42c104fSJakub Kicinskithe record sequence number. Meanwhile, the device had been parsing
322f42c104fSJakub Kicinskiand counting all records since the just-confirmed one, it adds the number
323f42c104fSJakub Kicinskiof records it had seen to the record number provided by the kernel.
324f42c104fSJakub KicinskiAt this point the device is in sync and can resume decryption at next
325f42c104fSJakub Kicinskisegment boundary.
326f42c104fSJakub Kicinski
327f42c104fSJakub KicinskiIn a pathological case the device may latch onto a sequence of matching
328f42c104fSJakub Kicinskiheaders and never hear back from the kernel (there is no negative
329f42c104fSJakub Kicinskiconfirmation from the kernel). The implementation may choose to periodically
330f42c104fSJakub Kicinskirestart scan. Given how unlikely falsely-matching stream is, however,
331f42c104fSJakub Kicinskiperiodic restart is not deemed necessary.
332f42c104fSJakub Kicinski
333f42c104fSJakub KicinskiSpecial care has to be taken if the confirmation request is passed
334f42c104fSJakub Kicinskiasynchronously to the packet stream and record may get processed
335f42c104fSJakub Kicinskiby the kernel before the confirmation request.
336f42c104fSJakub Kicinski
337f953d33bSJakub KicinskiStack-driven resynchronization
338f953d33bSJakub Kicinski~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
339f953d33bSJakub Kicinski
340f953d33bSJakub KicinskiThe driver may also request the stack to perform resynchronization
341f953d33bSJakub Kicinskiwhenever it sees the records are no longer getting decrypted.
342f953d33bSJakub KicinskiIf the connection is configured in this mode the stack automatically
343f953d33bSJakub Kicinskischedules resynchronization after it has received two completely encrypted
344f953d33bSJakub Kicinskirecords.
345f953d33bSJakub Kicinski
346f953d33bSJakub KicinskiThe stack waits for the socket to drain and informs the device about
347f953d33bSJakub Kicinskithe next expected record number and its TCP sequence number. If the
348f953d33bSJakub Kicinskirecords continue to be received fully encrypted stack retries the
349f953d33bSJakub Kicinskisynchronization with an exponential back off (first after 2 encrypted
350f953d33bSJakub Kicinskirecords, then after 4 records, after 8, after 16... up until every
351f953d33bSJakub Kicinski128 records).
352f953d33bSJakub Kicinski
353f42c104fSJakub KicinskiError handling
354f42c104fSJakub Kicinski==============
355f42c104fSJakub Kicinski
356f42c104fSJakub KicinskiTX
357f42c104fSJakub Kicinski--
358f42c104fSJakub Kicinski
359f42c104fSJakub KicinskiPackets may be redirected or rerouted by the stack to a different
360f42c104fSJakub Kicinskidevice than the selected TLS offload device. The stack will handle
361f42c104fSJakub Kicinskisuch condition using the :c:func:`sk_validate_xmit_skb` helper
362f42c104fSJakub Kicinski(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook).
363f42c104fSJakub KicinskiOffload maintains information about all records until the data is
364f42c104fSJakub Kicinskifully acknowledged, so if skbs reach the wrong device they can be handled
365f42c104fSJakub Kicinskiby software fallback.
366f42c104fSJakub Kicinski
367f42c104fSJakub KicinskiAny device TLS offload handling error on the transmission side must result
368f42c104fSJakub Kicinskiin the packet being dropped. For example if a packet got out of order
369f42c104fSJakub Kicinskidue to a bug in the stack or the device, reached the device and can't
370f42c104fSJakub Kicinskibe encrypted such packet must be dropped.
371f42c104fSJakub Kicinski
372f42c104fSJakub KicinskiRX
373f42c104fSJakub Kicinski--
374f42c104fSJakub Kicinski
375f42c104fSJakub KicinskiIf the device encounters any problems with TLS offload on the receive
376f42c104fSJakub Kicinskiside it should pass the packet to the host's networking stack as it was
377f42c104fSJakub Kicinskireceived on the wire.
378f42c104fSJakub Kicinski
379f42c104fSJakub KicinskiFor example authentication failure for any record in the segment should
380f42c104fSJakub Kicinskiresult in passing the unmodified packet to the software fallback. This means
381f42c104fSJakub Kicinskipackets should not be modified "in place". Splitting segments to handle partial
382f42c104fSJakub Kicinskidecryption is not advised. In other words either all records in the packet
383f42c104fSJakub Kicinskihad been handled successfully and authenticated or the packet has to be passed
384f42c104fSJakub Kicinskito the host's stack as it was on the wire (recovering original packet in the
385f42c104fSJakub Kicinskidriver if device provides precise error is sufficient).
386f42c104fSJakub Kicinski
387f42c104fSJakub KicinskiThe Linux networking stack does not provide a way of reporting per-packet
388f42c104fSJakub Kicinskidecryption and authentication errors, packets with errors must simply not
389f42c104fSJakub Kicinskihave the :c:member:`decrypted` mark set.
390f42c104fSJakub Kicinski
391f42c104fSJakub KicinskiA packet should also not be handled by the TLS offload if it contains
392f42c104fSJakub Kicinskiincorrect checksums.
393f42c104fSJakub Kicinski
394f42c104fSJakub KicinskiPerformance metrics
395f42c104fSJakub Kicinski===================
396f42c104fSJakub Kicinski
397f42c104fSJakub KicinskiTLS offload can be characterized by the following basic metrics:
398f42c104fSJakub Kicinski
399f42c104fSJakub Kicinski * max connection count
400f42c104fSJakub Kicinski * connection installation rate
401f42c104fSJakub Kicinski * connection installation latency
402f42c104fSJakub Kicinski * total cryptographic performance
403f42c104fSJakub Kicinski
404f42c104fSJakub KicinskiNote that each TCP connection requires a TLS session in both directions,
405f42c104fSJakub Kicinskithe performance may be reported treating each direction separately.
406f42c104fSJakub Kicinski
407f42c104fSJakub KicinskiMax connection count
408f42c104fSJakub Kicinski--------------------
409f42c104fSJakub Kicinski
410f42c104fSJakub KicinskiThe number of connections device can support can be exposed via
411f42c104fSJakub Kicinski``devlink resource`` API.
412f42c104fSJakub Kicinski
413f42c104fSJakub KicinskiTotal cryptographic performance
414f42c104fSJakub Kicinski-------------------------------
415f42c104fSJakub Kicinski
416f42c104fSJakub KicinskiOffload performance may depend on segment and record size.
417f42c104fSJakub Kicinski
418f42c104fSJakub KicinskiOverload of the cryptographic subsystem of the device should not have
419f42c104fSJakub Kicinskisignificant performance impact on non-offloaded streams.
420f42c104fSJakub Kicinski
421f42c104fSJakub KicinskiStatistics
422f42c104fSJakub Kicinski==========
423f42c104fSJakub Kicinski
424f42c104fSJakub KicinskiFollowing minimum set of TLS-related statistics should be reported
425f42c104fSJakub Kicinskiby the driver:
426f42c104fSJakub Kicinski
427280c0899STariq Toukan * ``rx_tls_decrypted_packets`` - number of successfully decrypted RX packets
428280c0899STariq Toukan   which were part of a TLS stream.
429280c0899STariq Toukan * ``rx_tls_decrypted_bytes`` - number of TLS payload bytes in RX packets
430280c0899STariq Toukan   which were successfully decrypted.
43176c1e1acSTariq Toukan * ``rx_tls_ctx`` - number of TLS RX HW offload contexts added to device for
43276c1e1acSTariq Toukan   decryption.
43376c1e1acSTariq Toukan * ``rx_tls_del`` - number of TLS RX HW offload contexts deleted from device
43476c1e1acSTariq Toukan   (connection has finished).
43576c1e1acSTariq Toukan * ``rx_tls_resync_req_pkt`` - number of received TLS packets with a resync
43676c1e1acSTariq Toukan    request.
43776c1e1acSTariq Toukan * ``rx_tls_resync_req_start`` - number of times the TLS async resync request
43876c1e1acSTariq Toukan    was started.
43976c1e1acSTariq Toukan * ``rx_tls_resync_req_end`` - number of times the TLS async resync request
44076c1e1acSTariq Toukan    properly ended with providing the HW tracked tcp-seq.
44176c1e1acSTariq Toukan * ``rx_tls_resync_req_skip`` - number of times the TLS async resync request
44276c1e1acSTariq Toukan    procedure was started by not properly ended.
44376c1e1acSTariq Toukan * ``rx_tls_resync_res_ok`` - number of times the TLS resync response call to
44476c1e1acSTariq Toukan    the driver was successfully handled.
44576c1e1acSTariq Toukan * ``rx_tls_resync_res_skip`` - number of times the TLS resync response call to
44676c1e1acSTariq Toukan    the driver was terminated unsuccessfully.
44776c1e1acSTariq Toukan * ``rx_tls_err`` - number of RX packets which were part of a TLS stream
44876c1e1acSTariq Toukan   but were not decrypted due to unexpected error in the state machine.
449280c0899STariq Toukan * ``tx_tls_encrypted_packets`` - number of TX packets passed to the device
450280c0899STariq Toukan   for encryption of their TLS payload.
451280c0899STariq Toukan * ``tx_tls_encrypted_bytes`` - number of TLS payload bytes in TX packets
452280c0899STariq Toukan   passed to the device for encryption.
453280c0899STariq Toukan * ``tx_tls_ctx`` - number of TLS TX HW offload contexts added to device for
454280c0899STariq Toukan   encryption.
455f42c104fSJakub Kicinski * ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
456280c0899STariq Toukan   but did not arrive in the expected order.
4572836654aSTariq Toukan * ``tx_tls_skip_no_sync_data`` - number of TX packets which were part of
4582836654aSTariq Toukan   a TLS stream and arrived out-of-order, but skipped the HW offload routine
4592836654aSTariq Toukan   and went to the regular transmit flow as they were retransmissions of the
4602836654aSTariq Toukan   connection handshake.
461280c0899STariq Toukan * ``tx_tls_drop_no_sync_data`` - number of TX packets which were part of
462280c0899STariq Toukan   a TLS stream dropped, because they arrived out of order and associated
463280c0899STariq Toukan   record could not be found.
464280c0899STariq Toukan * ``tx_tls_drop_bypass_req`` - number of TX packets which were part of a TLS
465280c0899STariq Toukan   stream dropped, because they contain both data that has been encrypted by
466280c0899STariq Toukan   software and data that expects hardware crypto offload.
467f42c104fSJakub Kicinski
468f42c104fSJakub KicinskiNotable corner cases, exceptions and additional requirements
469f42c104fSJakub Kicinski============================================================
470f42c104fSJakub Kicinski
471f42c104fSJakub Kicinski.. _5tuple_problems:
472f42c104fSJakub Kicinski
473f42c104fSJakub Kicinski5-tuple matching limitations
474f42c104fSJakub Kicinski----------------------------
475f42c104fSJakub Kicinski
476f42c104fSJakub KicinskiThe device can only recognize received packets based on the 5-tuple
477f42c104fSJakub Kicinskiof the socket. Current ``ktls`` implementation will not offload sockets
478f42c104fSJakub Kicinskirouted through software interfaces such as those used for tunneling
479f42c104fSJakub Kicinskior virtual networking. However, many packet transformations performed
480f42c104fSJakub Kicinskiby the networking stack (most notably any BPF logic) do not require
481f42c104fSJakub Kicinskiany intermediate software device, therefore a 5-tuple match may
482f42c104fSJakub Kicinskiconsistently miss at the device level. In such cases the device
483f42c104fSJakub Kicinskishould still be able to perform TX offload (encryption) and should
484f42c104fSJakub Kicinskifallback cleanly to software decryption (RX).
485f42c104fSJakub Kicinski
486f42c104fSJakub KicinskiOut of order
487f42c104fSJakub Kicinski------------
488f42c104fSJakub Kicinski
489f42c104fSJakub KicinskiIntroducing extra processing in NICs should not cause packets to be
490f42c104fSJakub Kicinskitransmitted or received out of order, for example pure ACK packets
491f42c104fSJakub Kicinskishould not be reordered with respect to data segments.
492f42c104fSJakub Kicinski
493f42c104fSJakub KicinskiIngress reorder
494f42c104fSJakub Kicinski---------------
495f42c104fSJakub Kicinski
496f42c104fSJakub KicinskiA device is permitted to perform packet reordering for consecutive
497f42c104fSJakub KicinskiTCP segments (i.e. placing packets in the correct order) but any form
498f42c104fSJakub Kicinskiof additional buffering is disallowed.
499f42c104fSJakub Kicinski
500f42c104fSJakub KicinskiCoexistence with standard networking offload features
501f42c104fSJakub Kicinski-----------------------------------------------------
502f42c104fSJakub Kicinski
503f42c104fSJakub KicinskiOffloaded ``ktls`` sockets should support standard TCP stack features
504f42c104fSJakub Kicinskitransparently. Enabling device TLS offload should not cause any difference
505f42c104fSJakub Kicinskiin packets as seen on the wire.
506f42c104fSJakub Kicinski
507f42c104fSJakub KicinskiTransport layer transparency
508f42c104fSJakub Kicinski----------------------------
509f42c104fSJakub Kicinski
510f42c104fSJakub KicinskiThe device should not modify any packet headers for the purpose
511f42c104fSJakub Kicinskiof the simplifying TLS offload.
512f42c104fSJakub Kicinski
513f42c104fSJakub KicinskiThe device should not depend on any packet headers beyond what is strictly
514f42c104fSJakub Kicinskinecessary for TLS offload.
515f42c104fSJakub Kicinski
516f42c104fSJakub KicinskiSegment drops
517f42c104fSJakub Kicinski-------------
518f42c104fSJakub Kicinski
519f42c104fSJakub KicinskiDropping packets is acceptable only in the event of catastrophic
520f42c104fSJakub Kicinskisystem errors and should never be used as an error handling mechanism
521f42c104fSJakub Kicinskiin cases arising from normal operation. In other words, reliance
522f42c104fSJakub Kicinskion TCP retransmissions to handle corner cases is not acceptable.
523f42c104fSJakub Kicinski
524f42c104fSJakub KicinskiTLS device features
525f42c104fSJakub Kicinski-------------------
526f42c104fSJakub Kicinski
527ae0b04b2STariq ToukanDrivers should ignore the changes to the TLS device feature flags.
528f42c104fSJakub KicinskiThese flags will be acted upon accordingly by the core ``ktls`` code.
529f42c104fSJakub KicinskiTLS device feature flags only control adding of new TLS connection
530f42c104fSJakub Kicinskioffloads, old connections will remain active after flags are cleared.
531ae0b04b2STariq Toukan
532ae0b04b2STariq ToukanTLS encryption cannot be offloaded to devices without checksum calculation
53325537d71STariq Toukanoffload. Hence, TLS TX device feature flag requires TX csum offload being set.
534ae0b04b2STariq ToukanDisabling the latter implies clearing the former. Disabling TX checksum offload
535ae0b04b2STariq Toukanshould not affect old connections, and drivers should make sure checksum
536ae0b04b2STariq Toukancalculation does not break for them.
537*a3eb4e9dSTariq ToukanSimilarly, device-offloaded TLS decryption implies doing RXCSUM. If the user
538*a3eb4e9dSTariq Toukandoes not want to enable RX csum offload, TLS RX device feature is disabled
539*a3eb4e9dSTariq Toukanas well.
540