1c4a0eb93SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2c4a0eb93SMauro Carvalho Chehab
3c4a0eb93SMauro Carvalho Chehab===============================================
4c4a0eb93SMauro Carvalho ChehabXFRM device - offloading the IPsec computations
5c4a0eb93SMauro Carvalho Chehab===============================================
6c4a0eb93SMauro Carvalho Chehab
7c4a0eb93SMauro Carvalho ChehabShannon Nelson <shannon.nelson@oracle.com>
82b7c72e0SLeon RomanovskyLeon Romanovsky <leonro@nvidia.com>
9c4a0eb93SMauro Carvalho Chehab
10c4a0eb93SMauro Carvalho Chehab
11c4a0eb93SMauro Carvalho ChehabOverview
12c4a0eb93SMauro Carvalho Chehab========
13c4a0eb93SMauro Carvalho Chehab
14c4a0eb93SMauro Carvalho ChehabIPsec is a useful feature for securing network traffic, but the
15c4a0eb93SMauro Carvalho Chehabcomputational cost is high: a 10Gbps link can easily be brought down
16c4a0eb93SMauro Carvalho Chehabto under 1Gbps, depending on the traffic and link configuration.
17c4a0eb93SMauro Carvalho ChehabLuckily, there are NICs that offer a hardware based IPsec offload which
18c4a0eb93SMauro Carvalho Chehabcan radically increase throughput and decrease CPU utilization.  The XFRM
19c4a0eb93SMauro Carvalho ChehabDevice interface allows NIC drivers to offer to the stack access to the
20c4a0eb93SMauro Carvalho Chehabhardware offload.
21c4a0eb93SMauro Carvalho Chehab
222b7c72e0SLeon RomanovskyRight now, there are two types of hardware offload that kernel supports.
232b7c72e0SLeon Romanovsky * IPsec crypto offload:
242b7c72e0SLeon Romanovsky   * NIC performs encrypt/decrypt
252b7c72e0SLeon Romanovsky   * Kernel does everything else
262b7c72e0SLeon Romanovsky * IPsec packet offload:
272b7c72e0SLeon Romanovsky   * NIC performs encrypt/decrypt
282b7c72e0SLeon Romanovsky   * NIC does encapsulation
292b7c72e0SLeon Romanovsky   * Kernel and NIC have SA and policy in-sync
302b7c72e0SLeon Romanovsky   * NIC handles the SA and policies states
312b7c72e0SLeon Romanovsky   * The Kernel talks to the keymanager
322b7c72e0SLeon Romanovsky
33c4a0eb93SMauro Carvalho ChehabUserland access to the offload is typically through a system such as
34c4a0eb93SMauro Carvalho Chehablibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
35c4a0eb93SMauro Carvalho Chehabbe handy when experimenting.  An example command might look something
362b7c72e0SLeon Romanovskylike this for crypto offload:
37c4a0eb93SMauro Carvalho Chehab
38c4a0eb93SMauro Carvalho Chehab  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
39c4a0eb93SMauro Carvalho Chehab     reqid 0x07 replay-window 32 \
40c4a0eb93SMauro Carvalho Chehab     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
41c4a0eb93SMauro Carvalho Chehab     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
42c4a0eb93SMauro Carvalho Chehab     offload dev eth4 dir in
43c4a0eb93SMauro Carvalho Chehab
442b7c72e0SLeon Romanovskyand for packet offload
452b7c72e0SLeon Romanovsky
462b7c72e0SLeon Romanovsky  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
472b7c72e0SLeon Romanovsky     reqid 0x07 replay-window 32 \
482b7c72e0SLeon Romanovsky     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
492b7c72e0SLeon Romanovsky     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
502b7c72e0SLeon Romanovsky     offload packet dev eth4 dir in
512b7c72e0SLeon Romanovsky
522b7c72e0SLeon Romanovsky  ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in
532b7c72e0SLeon Romanovsky  tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport
542b7c72e0SLeon Romanovsky
55c4a0eb93SMauro Carvalho ChehabYes, that's ugly, but that's what shell scripts and/or libreswan are for.
56c4a0eb93SMauro Carvalho Chehab
57c4a0eb93SMauro Carvalho Chehab
58c4a0eb93SMauro Carvalho Chehab
59c4a0eb93SMauro Carvalho ChehabCallbacks to implement
60c4a0eb93SMauro Carvalho Chehab======================
61c4a0eb93SMauro Carvalho Chehab
62c4a0eb93SMauro Carvalho Chehab::
63c4a0eb93SMauro Carvalho Chehab
64c4a0eb93SMauro Carvalho Chehab  /* from include/linux/netdevice.h */
65c4a0eb93SMauro Carvalho Chehab  struct xfrmdev_ops {
662b7c72e0SLeon Romanovsky        /* Crypto and Packet offload callbacks */
67*7681a4f5SLeon Romanovsky	int	(*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack);
68c4a0eb93SMauro Carvalho Chehab	void	(*xdo_dev_state_delete) (struct xfrm_state *x);
69c4a0eb93SMauro Carvalho Chehab	void	(*xdo_dev_state_free) (struct xfrm_state *x);
70c4a0eb93SMauro Carvalho Chehab	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
71c4a0eb93SMauro Carvalho Chehab				       struct xfrm_state *x);
72c4a0eb93SMauro Carvalho Chehab	void    (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
732b7c72e0SLeon Romanovsky
742b7c72e0SLeon Romanovsky        /* Solely packet offload callbacks */
752b7c72e0SLeon Romanovsky	void    (*xdo_dev_state_update_curlft) (struct xfrm_state *x);
763089386dSLeon Romanovsky	int	(*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack);
772b7c72e0SLeon Romanovsky	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
782b7c72e0SLeon Romanovsky	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
79c4a0eb93SMauro Carvalho Chehab  };
80c4a0eb93SMauro Carvalho Chehab
812b7c72e0SLeon RomanovskyThe NIC driver offering ipsec offload will need to implement callbacks
822b7c72e0SLeon Romanovskyrelevant to supported offload to make the offload available to the network
832b7c72e0SLeon Romanovskystack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
84c4a0eb93SMauro Carvalho ChehabNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
85c4a0eb93SMauro Carvalho Chehab
86c4a0eb93SMauro Carvalho Chehab
87c4a0eb93SMauro Carvalho Chehab
88c4a0eb93SMauro Carvalho ChehabFlow
89c4a0eb93SMauro Carvalho Chehab====
90c4a0eb93SMauro Carvalho Chehab
91c4a0eb93SMauro Carvalho ChehabAt probe time and before the call to register_netdev(), the driver should
92c4a0eb93SMauro Carvalho Chehabset up local data structures and XFRM callbacks, and set the feature bits.
93c4a0eb93SMauro Carvalho ChehabThe XFRM code's listener will finish the setup on NETDEV_REGISTER.
94c4a0eb93SMauro Carvalho Chehab
95c4a0eb93SMauro Carvalho Chehab::
96c4a0eb93SMauro Carvalho Chehab
97c4a0eb93SMauro Carvalho Chehab		adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
98c4a0eb93SMauro Carvalho Chehab		adapter->netdev->features |= NETIF_F_HW_ESP;
99c4a0eb93SMauro Carvalho Chehab		adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
100c4a0eb93SMauro Carvalho Chehab
101c4a0eb93SMauro Carvalho ChehabWhen new SAs are set up with a request for "offload" feature, the
102c4a0eb93SMauro Carvalho Chehabdriver's xdo_dev_state_add() will be given the new SA to be offloaded
103c4a0eb93SMauro Carvalho Chehaband an indication of whether it is for Rx or Tx.  The driver should
104c4a0eb93SMauro Carvalho Chehab
105c4a0eb93SMauro Carvalho Chehab	- verify the algorithm is supported for offloads
106c4a0eb93SMauro Carvalho Chehab	- store the SA information (key, salt, target-ip, protocol, etc)
107c4a0eb93SMauro Carvalho Chehab	- enable the HW offload of the SA
108c4a0eb93SMauro Carvalho Chehab	- return status value:
109c4a0eb93SMauro Carvalho Chehab
110c4a0eb93SMauro Carvalho Chehab		===========   ===================================
111c4a0eb93SMauro Carvalho Chehab		0             success
1122b7c72e0SLeon Romanovsky		-EOPNETSUPP   offload not supported, try SW IPsec,
1132b7c72e0SLeon Romanovsky                              not applicable for packet offload mode
114c4a0eb93SMauro Carvalho Chehab		other         fail the request
115c4a0eb93SMauro Carvalho Chehab		===========   ===================================
116c4a0eb93SMauro Carvalho Chehab
117c4a0eb93SMauro Carvalho ChehabThe driver can also set an offload_handle in the SA, an opaque void pointer
118c4a0eb93SMauro Carvalho Chehabthat can be used to convey context into the fast-path offload requests::
119c4a0eb93SMauro Carvalho Chehab
120c4a0eb93SMauro Carvalho Chehab		xs->xso.offload_handle = context;
121c4a0eb93SMauro Carvalho Chehab
122c4a0eb93SMauro Carvalho Chehab
123c4a0eb93SMauro Carvalho ChehabWhen the network stack is preparing an IPsec packet for an SA that has
124c4a0eb93SMauro Carvalho Chehabbeen setup for offload, it first calls into xdo_dev_offload_ok() with
125c4a0eb93SMauro Carvalho Chehabthe skb and the intended offload state to ask the driver if the offload
126c4a0eb93SMauro Carvalho Chehabwill serviceable.  This can check the packet information to be sure the
127c4a0eb93SMauro Carvalho Chehaboffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
128c4a0eb93SMauro Carvalho Chehabreturn true of false to signify its support.
129c4a0eb93SMauro Carvalho Chehab
1302b7c72e0SLeon RomanovskyCrypto offload mode:
131c4a0eb93SMauro Carvalho ChehabWhen ready to send, the driver needs to inspect the Tx packet for the
132c4a0eb93SMauro Carvalho Chehaboffload information, including the opaque context, and set up the packet
133c4a0eb93SMauro Carvalho Chehabsend accordingly::
134c4a0eb93SMauro Carvalho Chehab
135c4a0eb93SMauro Carvalho Chehab		xs = xfrm_input_state(skb);
136c4a0eb93SMauro Carvalho Chehab		context = xs->xso.offload_handle;
137c4a0eb93SMauro Carvalho Chehab		set up HW for send
138c4a0eb93SMauro Carvalho Chehab
139c4a0eb93SMauro Carvalho ChehabThe stack has already inserted the appropriate IPsec headers in the
140c4a0eb93SMauro Carvalho Chehabpacket data, the offload just needs to do the encryption and fix up the
141c4a0eb93SMauro Carvalho Chehabheader values.
142c4a0eb93SMauro Carvalho Chehab
143c4a0eb93SMauro Carvalho Chehab
144c4a0eb93SMauro Carvalho ChehabWhen a packet is received and the HW has indicated that it offloaded a
145c4a0eb93SMauro Carvalho Chehabdecryption, the driver needs to add a reference to the decoded SA into
146c4a0eb93SMauro Carvalho Chehabthe packet's skb.  At this point the data should be decrypted but the
147c4a0eb93SMauro Carvalho ChehabIPsec headers are still in the packet data; they are removed later up
148c4a0eb93SMauro Carvalho Chehabthe stack in xfrm_input().
149c4a0eb93SMauro Carvalho Chehab
150c4a0eb93SMauro Carvalho Chehab	find and hold the SA that was used to the Rx skb::
151c4a0eb93SMauro Carvalho Chehab
152c4a0eb93SMauro Carvalho Chehab		get spi, protocol, and destination IP from packet headers
153c4a0eb93SMauro Carvalho Chehab		xs = find xs from (spi, protocol, dest_IP)
154c4a0eb93SMauro Carvalho Chehab		xfrm_state_hold(xs);
155c4a0eb93SMauro Carvalho Chehab
156c4a0eb93SMauro Carvalho Chehab	store the state information into the skb::
157c4a0eb93SMauro Carvalho Chehab
158c4a0eb93SMauro Carvalho Chehab		sp = secpath_set(skb);
159c4a0eb93SMauro Carvalho Chehab		if (!sp) return;
160c4a0eb93SMauro Carvalho Chehab		sp->xvec[sp->len++] = xs;
161c4a0eb93SMauro Carvalho Chehab		sp->olen++;
162c4a0eb93SMauro Carvalho Chehab
163c4a0eb93SMauro Carvalho Chehab	indicate the success and/or error status of the offload::
164c4a0eb93SMauro Carvalho Chehab
165c4a0eb93SMauro Carvalho Chehab		xo = xfrm_offload(skb);
166c4a0eb93SMauro Carvalho Chehab		xo->flags = CRYPTO_DONE;
167c4a0eb93SMauro Carvalho Chehab		xo->status = crypto_status;
168c4a0eb93SMauro Carvalho Chehab
169c4a0eb93SMauro Carvalho Chehab	hand the packet to napi_gro_receive() as usual
170c4a0eb93SMauro Carvalho Chehab
171c4a0eb93SMauro Carvalho ChehabIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn().
172c4a0eb93SMauro Carvalho ChehabDriver will check packet seq number and update HW ESN state machine if needed.
173c4a0eb93SMauro Carvalho Chehab
1742b7c72e0SLeon RomanovskyPacket offload mode:
1752b7c72e0SLeon RomanovskyHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW
1762b7c72e0SLeon Romanovskyreported success. In TX path, the packet lefts kernel without extra header
1772b7c72e0SLeon Romanovskyand not encrypted, the HW is responsible to perform it.
1782b7c72e0SLeon Romanovsky
179c4a0eb93SMauro Carvalho ChehabWhen the SA is removed by the user, the driver's xdo_dev_state_delete()
1802b7c72e0SLeon Romanovskyand xdo_dev_policy_delete() are asked to disable the offload.  Later,
1812b7c72e0SLeon Romanovskyxdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage
1822b7c72e0SLeon Romanovskycollection routine after all reference counts to the state and policy
183c4a0eb93SMauro Carvalho Chehabhave been removed and any remaining resources can be cleared for the
184c4a0eb93SMauro Carvalho Chehaboffload state.  How these are used by the driver will depend on specific
185c4a0eb93SMauro Carvalho Chehabhardware needs.
186c4a0eb93SMauro Carvalho Chehab
187c4a0eb93SMauro Carvalho ChehabAs a netdev is set to DOWN the XFRM stack's netdev listener will call
1882b7c72e0SLeon Romanovskyxdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and
1892b7c72e0SLeon Romanovskyxdo_dev_policy_free() on any remaining offloaded states.
1902b7c72e0SLeon Romanovsky
1912b7c72e0SLeon RomanovskyOutcome of HW handling packets, the XFRM core can't count hard, soft limits.
1922b7c72e0SLeon RomanovskyThe HW/driver are responsible to perform it and provide accurate data when
1932b7c72e0SLeon Romanovskyxdo_dev_state_update_curlft() is called. In case of one of these limits
1942b7c72e0SLeon Romanovskyoccuried, the driver needs to call to xfrm_state_check_expire() to make sure
1952b7c72e0SLeon Romanovskythat XFRM performs rekeying sequence.
196