1c4a0eb93SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2*6c8f7c43SRahul Rameshbabu.. _xfrm_device:
3c4a0eb93SMauro Carvalho Chehab
4c4a0eb93SMauro Carvalho Chehab===============================================
5c4a0eb93SMauro Carvalho ChehabXFRM device - offloading the IPsec computations
6c4a0eb93SMauro Carvalho Chehab===============================================
7c4a0eb93SMauro Carvalho Chehab
8c4a0eb93SMauro Carvalho ChehabShannon Nelson <shannon.nelson@oracle.com>
92b7c72e0SLeon RomanovskyLeon Romanovsky <leonro@nvidia.com>
10c4a0eb93SMauro Carvalho Chehab
11c4a0eb93SMauro Carvalho Chehab
12c4a0eb93SMauro Carvalho ChehabOverview
13c4a0eb93SMauro Carvalho Chehab========
14c4a0eb93SMauro Carvalho Chehab
15c4a0eb93SMauro Carvalho ChehabIPsec is a useful feature for securing network traffic, but the
16c4a0eb93SMauro Carvalho Chehabcomputational cost is high: a 10Gbps link can easily be brought down
17c4a0eb93SMauro Carvalho Chehabto under 1Gbps, depending on the traffic and link configuration.
18c4a0eb93SMauro Carvalho ChehabLuckily, there are NICs that offer a hardware based IPsec offload which
19c4a0eb93SMauro Carvalho Chehabcan radically increase throughput and decrease CPU utilization.  The XFRM
20c4a0eb93SMauro Carvalho ChehabDevice interface allows NIC drivers to offer to the stack access to the
21c4a0eb93SMauro Carvalho Chehabhardware offload.
22c4a0eb93SMauro Carvalho Chehab
232b7c72e0SLeon RomanovskyRight now, there are two types of hardware offload that kernel supports.
242b7c72e0SLeon Romanovsky * IPsec crypto offload:
252b7c72e0SLeon Romanovsky   * NIC performs encrypt/decrypt
262b7c72e0SLeon Romanovsky   * Kernel does everything else
272b7c72e0SLeon Romanovsky * IPsec packet offload:
282b7c72e0SLeon Romanovsky   * NIC performs encrypt/decrypt
292b7c72e0SLeon Romanovsky   * NIC does encapsulation
302b7c72e0SLeon Romanovsky   * Kernel and NIC have SA and policy in-sync
312b7c72e0SLeon Romanovsky   * NIC handles the SA and policies states
322b7c72e0SLeon Romanovsky   * The Kernel talks to the keymanager
332b7c72e0SLeon Romanovsky
34c4a0eb93SMauro Carvalho ChehabUserland access to the offload is typically through a system such as
35c4a0eb93SMauro Carvalho Chehablibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
36c4a0eb93SMauro Carvalho Chehabbe handy when experimenting.  An example command might look something
372b7c72e0SLeon Romanovskylike this for crypto offload:
38c4a0eb93SMauro Carvalho Chehab
39c4a0eb93SMauro Carvalho Chehab  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
40c4a0eb93SMauro Carvalho Chehab     reqid 0x07 replay-window 32 \
41c4a0eb93SMauro Carvalho Chehab     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
42c4a0eb93SMauro Carvalho Chehab     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
43c4a0eb93SMauro Carvalho Chehab     offload dev eth4 dir in
44c4a0eb93SMauro Carvalho Chehab
452b7c72e0SLeon Romanovskyand for packet offload
462b7c72e0SLeon Romanovsky
472b7c72e0SLeon Romanovsky  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
482b7c72e0SLeon Romanovsky     reqid 0x07 replay-window 32 \
492b7c72e0SLeon Romanovsky     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
502b7c72e0SLeon Romanovsky     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
512b7c72e0SLeon Romanovsky     offload packet dev eth4 dir in
522b7c72e0SLeon Romanovsky
532b7c72e0SLeon Romanovsky  ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in
542b7c72e0SLeon Romanovsky  tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport
552b7c72e0SLeon Romanovsky
56c4a0eb93SMauro Carvalho ChehabYes, that's ugly, but that's what shell scripts and/or libreswan are for.
57c4a0eb93SMauro Carvalho Chehab
58c4a0eb93SMauro Carvalho Chehab
59c4a0eb93SMauro Carvalho Chehab
60c4a0eb93SMauro Carvalho ChehabCallbacks to implement
61c4a0eb93SMauro Carvalho Chehab======================
62c4a0eb93SMauro Carvalho Chehab
63c4a0eb93SMauro Carvalho Chehab::
64c4a0eb93SMauro Carvalho Chehab
65c4a0eb93SMauro Carvalho Chehab  /* from include/linux/netdevice.h */
66c4a0eb93SMauro Carvalho Chehab  struct xfrmdev_ops {
672b7c72e0SLeon Romanovsky        /* Crypto and Packet offload callbacks */
687681a4f5SLeon Romanovsky	int	(*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
69c4a0eb93SMauro Carvalho Chehab	void	(*xdo_dev_state_delete) (struct xfrm_state *x);
70c4a0eb93SMauro Carvalho Chehab	void	(*xdo_dev_state_free) (struct xfrm_state *x);
71c4a0eb93SMauro Carvalho Chehab	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
72c4a0eb93SMauro Carvalho Chehab				       struct xfrm_state *x);
73c4a0eb93SMauro Carvalho Chehab	void    (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
742b7c72e0SLeon Romanovsky
752b7c72e0SLeon Romanovsky        /* Solely packet offload callbacks */
762b7c72e0SLeon Romanovsky	void    (*xdo_dev_state_update_curlft) (struct xfrm_state *x);
773089386dSLeon Romanovsky	int	(*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack);
782b7c72e0SLeon Romanovsky	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
792b7c72e0SLeon Romanovsky	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
80c4a0eb93SMauro Carvalho Chehab  };
81c4a0eb93SMauro Carvalho Chehab
822b7c72e0SLeon RomanovskyThe NIC driver offering ipsec offload will need to implement callbacks
832b7c72e0SLeon Romanovskyrelevant to supported offload to make the offload available to the network
842b7c72e0SLeon Romanovskystack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
85c4a0eb93SMauro Carvalho ChehabNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
86c4a0eb93SMauro Carvalho Chehab
87c4a0eb93SMauro Carvalho Chehab
88c4a0eb93SMauro Carvalho Chehab
89c4a0eb93SMauro Carvalho ChehabFlow
90c4a0eb93SMauro Carvalho Chehab====
91c4a0eb93SMauro Carvalho Chehab
92c4a0eb93SMauro Carvalho ChehabAt probe time and before the call to register_netdev(), the driver should
93c4a0eb93SMauro Carvalho Chehabset up local data structures and XFRM callbacks, and set the feature bits.
94c4a0eb93SMauro Carvalho ChehabThe XFRM code's listener will finish the setup on NETDEV_REGISTER.
95c4a0eb93SMauro Carvalho Chehab
96c4a0eb93SMauro Carvalho Chehab::
97c4a0eb93SMauro Carvalho Chehab
98c4a0eb93SMauro Carvalho Chehab		adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
99c4a0eb93SMauro Carvalho Chehab		adapter->netdev->features |= NETIF_F_HW_ESP;
100c4a0eb93SMauro Carvalho Chehab		adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
101c4a0eb93SMauro Carvalho Chehab
102c4a0eb93SMauro Carvalho ChehabWhen new SAs are set up with a request for "offload" feature, the
103c4a0eb93SMauro Carvalho Chehabdriver's xdo_dev_state_add() will be given the new SA to be offloaded
104c4a0eb93SMauro Carvalho Chehaband an indication of whether it is for Rx or Tx.  The driver should
105c4a0eb93SMauro Carvalho Chehab
106c4a0eb93SMauro Carvalho Chehab	- verify the algorithm is supported for offloads
107c4a0eb93SMauro Carvalho Chehab	- store the SA information (key, salt, target-ip, protocol, etc)
108c4a0eb93SMauro Carvalho Chehab	- enable the HW offload of the SA
109c4a0eb93SMauro Carvalho Chehab	- return status value:
110c4a0eb93SMauro Carvalho Chehab
111c4a0eb93SMauro Carvalho Chehab		===========   ===================================
112c4a0eb93SMauro Carvalho Chehab		0             success
1132b7c72e0SLeon Romanovsky		-EOPNETSUPP   offload not supported, try SW IPsec,
1142b7c72e0SLeon Romanovsky                              not applicable for packet offload mode
115c4a0eb93SMauro Carvalho Chehab		other         fail the request
116c4a0eb93SMauro Carvalho Chehab		===========   ===================================
117c4a0eb93SMauro Carvalho Chehab
118c4a0eb93SMauro Carvalho ChehabThe driver can also set an offload_handle in the SA, an opaque void pointer
119c4a0eb93SMauro Carvalho Chehabthat can be used to convey context into the fast-path offload requests::
120c4a0eb93SMauro Carvalho Chehab
121c4a0eb93SMauro Carvalho Chehab		xs->xso.offload_handle = context;
122c4a0eb93SMauro Carvalho Chehab
123c4a0eb93SMauro Carvalho Chehab
124c4a0eb93SMauro Carvalho ChehabWhen the network stack is preparing an IPsec packet for an SA that has
125c4a0eb93SMauro Carvalho Chehabbeen setup for offload, it first calls into xdo_dev_offload_ok() with
126c4a0eb93SMauro Carvalho Chehabthe skb and the intended offload state to ask the driver if the offload
127c4a0eb93SMauro Carvalho Chehabwill serviceable.  This can check the packet information to be sure the
128c4a0eb93SMauro Carvalho Chehaboffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
129c4a0eb93SMauro Carvalho Chehabreturn true of false to signify its support.
130c4a0eb93SMauro Carvalho Chehab
1312b7c72e0SLeon RomanovskyCrypto offload mode:
132c4a0eb93SMauro Carvalho ChehabWhen ready to send, the driver needs to inspect the Tx packet for the
133c4a0eb93SMauro Carvalho Chehaboffload information, including the opaque context, and set up the packet
134c4a0eb93SMauro Carvalho Chehabsend accordingly::
135c4a0eb93SMauro Carvalho Chehab
136c4a0eb93SMauro Carvalho Chehab		xs = xfrm_input_state(skb);
137c4a0eb93SMauro Carvalho Chehab		context = xs->xso.offload_handle;
138c4a0eb93SMauro Carvalho Chehab		set up HW for send
139c4a0eb93SMauro Carvalho Chehab
140c4a0eb93SMauro Carvalho ChehabThe stack has already inserted the appropriate IPsec headers in the
141c4a0eb93SMauro Carvalho Chehabpacket data, the offload just needs to do the encryption and fix up the
142c4a0eb93SMauro Carvalho Chehabheader values.
143c4a0eb93SMauro Carvalho Chehab
144c4a0eb93SMauro Carvalho Chehab
145c4a0eb93SMauro Carvalho ChehabWhen a packet is received and the HW has indicated that it offloaded a
146c4a0eb93SMauro Carvalho Chehabdecryption, the driver needs to add a reference to the decoded SA into
147c4a0eb93SMauro Carvalho Chehabthe packet's skb.  At this point the data should be decrypted but the
148c4a0eb93SMauro Carvalho ChehabIPsec headers are still in the packet data; they are removed later up
149c4a0eb93SMauro Carvalho Chehabthe stack in xfrm_input().
150c4a0eb93SMauro Carvalho Chehab
151c4a0eb93SMauro Carvalho Chehab	find and hold the SA that was used to the Rx skb::
152c4a0eb93SMauro Carvalho Chehab
153c4a0eb93SMauro Carvalho Chehab		get spi, protocol, and destination IP from packet headers
154c4a0eb93SMauro Carvalho Chehab		xs = find xs from (spi, protocol, dest_IP)
155c4a0eb93SMauro Carvalho Chehab		xfrm_state_hold(xs);
156c4a0eb93SMauro Carvalho Chehab
157c4a0eb93SMauro Carvalho Chehab	store the state information into the skb::
158c4a0eb93SMauro Carvalho Chehab
159c4a0eb93SMauro Carvalho Chehab		sp = secpath_set(skb);
160c4a0eb93SMauro Carvalho Chehab		if (!sp) return;
161c4a0eb93SMauro Carvalho Chehab		sp->xvec[sp->len++] = xs;
162c4a0eb93SMauro Carvalho Chehab		sp->olen++;
163c4a0eb93SMauro Carvalho Chehab
164c4a0eb93SMauro Carvalho Chehab	indicate the success and/or error status of the offload::
165c4a0eb93SMauro Carvalho Chehab
166c4a0eb93SMauro Carvalho Chehab		xo = xfrm_offload(skb);
167c4a0eb93SMauro Carvalho Chehab		xo->flags = CRYPTO_DONE;
168c4a0eb93SMauro Carvalho Chehab		xo->status = crypto_status;
169c4a0eb93SMauro Carvalho Chehab
170c4a0eb93SMauro Carvalho Chehab	hand the packet to napi_gro_receive() as usual
171c4a0eb93SMauro Carvalho Chehab
172c4a0eb93SMauro Carvalho ChehabIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
173c4a0eb93SMauro Carvalho ChehabDriver will check packet seq number and update HW ESN state machine if needed.
174c4a0eb93SMauro Carvalho Chehab
1752b7c72e0SLeon RomanovskyPacket offload mode:
1762b7c72e0SLeon RomanovskyHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW
1772b7c72e0SLeon Romanovskyreported success. In TX path, the packet lefts kernel without extra header
1782b7c72e0SLeon Romanovskyand not encrypted, the HW is responsible to perform it.
1792b7c72e0SLeon Romanovsky
180c4a0eb93SMauro Carvalho ChehabWhen the SA is removed by the user, the driver's xdo_dev_state_delete()
1812b7c72e0SLeon Romanovskyand xdo_dev_policy_delete() are asked to disable the offload.  Later,
1822b7c72e0SLeon Romanovskyxdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage
1832b7c72e0SLeon Romanovskycollection routine after all reference counts to the state and policy
184c4a0eb93SMauro Carvalho Chehabhave been removed and any remaining resources can be cleared for the
185c4a0eb93SMauro Carvalho Chehaboffload state.  How these are used by the driver will depend on specific
186c4a0eb93SMauro Carvalho Chehabhardware needs.
187c4a0eb93SMauro Carvalho Chehab
188c4a0eb93SMauro Carvalho ChehabAs a netdev is set to DOWN the XFRM stack's netdev listener will call
1892b7c72e0SLeon Romanovskyxdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and
1902b7c72e0SLeon Romanovskyxdo_dev_policy_free() on any remaining offloaded states.
1912b7c72e0SLeon Romanovsky
1922b7c72e0SLeon RomanovskyOutcome of HW handling packets, the XFRM core can't count hard, soft limits.
1932b7c72e0SLeon RomanovskyThe HW/driver are responsible to perform it and provide accurate data when
1942b7c72e0SLeon Romanovskyxdo_dev_state_update_curlft() is called. In case of one of these limits
1952b7c72e0SLeon Romanovskyoccuried, the driver needs to call to xfrm_state_check_expire() to make sure
1962b7c72e0SLeon Romanovskythat XFRM performs rekeying sequence.
197