1c4a0eb93SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2c4a0eb93SMauro Carvalho Chehab 3c4a0eb93SMauro Carvalho Chehab=============================================== 4c4a0eb93SMauro Carvalho ChehabXFRM device - offloading the IPsec computations 5c4a0eb93SMauro Carvalho Chehab=============================================== 6c4a0eb93SMauro Carvalho Chehab 7c4a0eb93SMauro Carvalho ChehabShannon Nelson <shannon.nelson@oracle.com> 82b7c72e0SLeon RomanovskyLeon Romanovsky <leonro@nvidia.com> 9c4a0eb93SMauro Carvalho Chehab 10c4a0eb93SMauro Carvalho Chehab 11c4a0eb93SMauro Carvalho ChehabOverview 12c4a0eb93SMauro Carvalho Chehab======== 13c4a0eb93SMauro Carvalho Chehab 14c4a0eb93SMauro Carvalho ChehabIPsec is a useful feature for securing network traffic, but the 15c4a0eb93SMauro Carvalho Chehabcomputational cost is high: a 10Gbps link can easily be brought down 16c4a0eb93SMauro Carvalho Chehabto under 1Gbps, depending on the traffic and link configuration. 17c4a0eb93SMauro Carvalho ChehabLuckily, there are NICs that offer a hardware based IPsec offload which 18c4a0eb93SMauro Carvalho Chehabcan radically increase throughput and decrease CPU utilization. The XFRM 19c4a0eb93SMauro Carvalho ChehabDevice interface allows NIC drivers to offer to the stack access to the 20c4a0eb93SMauro Carvalho Chehabhardware offload. 21c4a0eb93SMauro Carvalho Chehab 222b7c72e0SLeon RomanovskyRight now, there are two types of hardware offload that kernel supports. 232b7c72e0SLeon Romanovsky * IPsec crypto offload: 242b7c72e0SLeon Romanovsky * NIC performs encrypt/decrypt 252b7c72e0SLeon Romanovsky * Kernel does everything else 262b7c72e0SLeon Romanovsky * IPsec packet offload: 272b7c72e0SLeon Romanovsky * NIC performs encrypt/decrypt 282b7c72e0SLeon Romanovsky * NIC does encapsulation 292b7c72e0SLeon Romanovsky * Kernel and NIC have SA and policy in-sync 302b7c72e0SLeon Romanovsky * NIC handles the SA and policies states 312b7c72e0SLeon Romanovsky * The Kernel talks to the keymanager 322b7c72e0SLeon Romanovsky 33c4a0eb93SMauro Carvalho ChehabUserland access to the offload is typically through a system such as 34c4a0eb93SMauro Carvalho Chehablibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can 35c4a0eb93SMauro Carvalho Chehabbe handy when experimenting. An example command might look something 362b7c72e0SLeon Romanovskylike this for crypto offload: 37c4a0eb93SMauro Carvalho Chehab 38c4a0eb93SMauro Carvalho Chehab ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 39c4a0eb93SMauro Carvalho Chehab reqid 0x07 replay-window 32 \ 40c4a0eb93SMauro Carvalho Chehab aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 41c4a0eb93SMauro Carvalho Chehab sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 42c4a0eb93SMauro Carvalho Chehab offload dev eth4 dir in 43c4a0eb93SMauro Carvalho Chehab 442b7c72e0SLeon Romanovskyand for packet offload 452b7c72e0SLeon Romanovsky 462b7c72e0SLeon Romanovsky ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 472b7c72e0SLeon Romanovsky reqid 0x07 replay-window 32 \ 482b7c72e0SLeon Romanovsky aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 492b7c72e0SLeon Romanovsky sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 502b7c72e0SLeon Romanovsky offload packet dev eth4 dir in 512b7c72e0SLeon Romanovsky 522b7c72e0SLeon Romanovsky ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in 532b7c72e0SLeon Romanovsky tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport 542b7c72e0SLeon Romanovsky 55c4a0eb93SMauro Carvalho ChehabYes, that's ugly, but that's what shell scripts and/or libreswan are for. 56c4a0eb93SMauro Carvalho Chehab 57c4a0eb93SMauro Carvalho Chehab 58c4a0eb93SMauro Carvalho Chehab 59c4a0eb93SMauro Carvalho ChehabCallbacks to implement 60c4a0eb93SMauro Carvalho Chehab====================== 61c4a0eb93SMauro Carvalho Chehab 62c4a0eb93SMauro Carvalho Chehab:: 63c4a0eb93SMauro Carvalho Chehab 64c4a0eb93SMauro Carvalho Chehab /* from include/linux/netdevice.h */ 65c4a0eb93SMauro Carvalho Chehab struct xfrmdev_ops { 662b7c72e0SLeon Romanovsky /* Crypto and Packet offload callbacks */ 67*7681a4f5SLeon Romanovsky int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); 68c4a0eb93SMauro Carvalho Chehab void (*xdo_dev_state_delete) (struct xfrm_state *x); 69c4a0eb93SMauro Carvalho Chehab void (*xdo_dev_state_free) (struct xfrm_state *x); 70c4a0eb93SMauro Carvalho Chehab bool (*xdo_dev_offload_ok) (struct sk_buff *skb, 71c4a0eb93SMauro Carvalho Chehab struct xfrm_state *x); 72c4a0eb93SMauro Carvalho Chehab void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); 732b7c72e0SLeon Romanovsky 742b7c72e0SLeon Romanovsky /* Solely packet offload callbacks */ 752b7c72e0SLeon Romanovsky void (*xdo_dev_state_update_curlft) (struct xfrm_state *x); 763089386dSLeon Romanovsky int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); 772b7c72e0SLeon Romanovsky void (*xdo_dev_policy_delete) (struct xfrm_policy *x); 782b7c72e0SLeon Romanovsky void (*xdo_dev_policy_free) (struct xfrm_policy *x); 79c4a0eb93SMauro Carvalho Chehab }; 80c4a0eb93SMauro Carvalho Chehab 812b7c72e0SLeon RomanovskyThe NIC driver offering ipsec offload will need to implement callbacks 822b7c72e0SLeon Romanovskyrelevant to supported offload to make the offload available to the network 832b7c72e0SLeon Romanovskystack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and 84c4a0eb93SMauro Carvalho ChehabNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload. 85c4a0eb93SMauro Carvalho Chehab 86c4a0eb93SMauro Carvalho Chehab 87c4a0eb93SMauro Carvalho Chehab 88c4a0eb93SMauro Carvalho ChehabFlow 89c4a0eb93SMauro Carvalho Chehab==== 90c4a0eb93SMauro Carvalho Chehab 91c4a0eb93SMauro Carvalho ChehabAt probe time and before the call to register_netdev(), the driver should 92c4a0eb93SMauro Carvalho Chehabset up local data structures and XFRM callbacks, and set the feature bits. 93c4a0eb93SMauro Carvalho ChehabThe XFRM code's listener will finish the setup on NETDEV_REGISTER. 94c4a0eb93SMauro Carvalho Chehab 95c4a0eb93SMauro Carvalho Chehab:: 96c4a0eb93SMauro Carvalho Chehab 97c4a0eb93SMauro Carvalho Chehab adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops; 98c4a0eb93SMauro Carvalho Chehab adapter->netdev->features |= NETIF_F_HW_ESP; 99c4a0eb93SMauro Carvalho Chehab adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP; 100c4a0eb93SMauro Carvalho Chehab 101c4a0eb93SMauro Carvalho ChehabWhen new SAs are set up with a request for "offload" feature, the 102c4a0eb93SMauro Carvalho Chehabdriver's xdo_dev_state_add() will be given the new SA to be offloaded 103c4a0eb93SMauro Carvalho Chehaband an indication of whether it is for Rx or Tx. The driver should 104c4a0eb93SMauro Carvalho Chehab 105c4a0eb93SMauro Carvalho Chehab - verify the algorithm is supported for offloads 106c4a0eb93SMauro Carvalho Chehab - store the SA information (key, salt, target-ip, protocol, etc) 107c4a0eb93SMauro Carvalho Chehab - enable the HW offload of the SA 108c4a0eb93SMauro Carvalho Chehab - return status value: 109c4a0eb93SMauro Carvalho Chehab 110c4a0eb93SMauro Carvalho Chehab =========== =================================== 111c4a0eb93SMauro Carvalho Chehab 0 success 1122b7c72e0SLeon Romanovsky -EOPNETSUPP offload not supported, try SW IPsec, 1132b7c72e0SLeon Romanovsky not applicable for packet offload mode 114c4a0eb93SMauro Carvalho Chehab other fail the request 115c4a0eb93SMauro Carvalho Chehab =========== =================================== 116c4a0eb93SMauro Carvalho Chehab 117c4a0eb93SMauro Carvalho ChehabThe driver can also set an offload_handle in the SA, an opaque void pointer 118c4a0eb93SMauro Carvalho Chehabthat can be used to convey context into the fast-path offload requests:: 119c4a0eb93SMauro Carvalho Chehab 120c4a0eb93SMauro Carvalho Chehab xs->xso.offload_handle = context; 121c4a0eb93SMauro Carvalho Chehab 122c4a0eb93SMauro Carvalho Chehab 123c4a0eb93SMauro Carvalho ChehabWhen the network stack is preparing an IPsec packet for an SA that has 124c4a0eb93SMauro Carvalho Chehabbeen setup for offload, it first calls into xdo_dev_offload_ok() with 125c4a0eb93SMauro Carvalho Chehabthe skb and the intended offload state to ask the driver if the offload 126c4a0eb93SMauro Carvalho Chehabwill serviceable. This can check the packet information to be sure the 127c4a0eb93SMauro Carvalho Chehaboffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and 128c4a0eb93SMauro Carvalho Chehabreturn true of false to signify its support. 129c4a0eb93SMauro Carvalho Chehab 1302b7c72e0SLeon RomanovskyCrypto offload mode: 131c4a0eb93SMauro Carvalho ChehabWhen ready to send, the driver needs to inspect the Tx packet for the 132c4a0eb93SMauro Carvalho Chehaboffload information, including the opaque context, and set up the packet 133c4a0eb93SMauro Carvalho Chehabsend accordingly:: 134c4a0eb93SMauro Carvalho Chehab 135c4a0eb93SMauro Carvalho Chehab xs = xfrm_input_state(skb); 136c4a0eb93SMauro Carvalho Chehab context = xs->xso.offload_handle; 137c4a0eb93SMauro Carvalho Chehab set up HW for send 138c4a0eb93SMauro Carvalho Chehab 139c4a0eb93SMauro Carvalho ChehabThe stack has already inserted the appropriate IPsec headers in the 140c4a0eb93SMauro Carvalho Chehabpacket data, the offload just needs to do the encryption and fix up the 141c4a0eb93SMauro Carvalho Chehabheader values. 142c4a0eb93SMauro Carvalho Chehab 143c4a0eb93SMauro Carvalho Chehab 144c4a0eb93SMauro Carvalho ChehabWhen a packet is received and the HW has indicated that it offloaded a 145c4a0eb93SMauro Carvalho Chehabdecryption, the driver needs to add a reference to the decoded SA into 146c4a0eb93SMauro Carvalho Chehabthe packet's skb. At this point the data should be decrypted but the 147c4a0eb93SMauro Carvalho ChehabIPsec headers are still in the packet data; they are removed later up 148c4a0eb93SMauro Carvalho Chehabthe stack in xfrm_input(). 149c4a0eb93SMauro Carvalho Chehab 150c4a0eb93SMauro Carvalho Chehab find and hold the SA that was used to the Rx skb:: 151c4a0eb93SMauro Carvalho Chehab 152c4a0eb93SMauro Carvalho Chehab get spi, protocol, and destination IP from packet headers 153c4a0eb93SMauro Carvalho Chehab xs = find xs from (spi, protocol, dest_IP) 154c4a0eb93SMauro Carvalho Chehab xfrm_state_hold(xs); 155c4a0eb93SMauro Carvalho Chehab 156c4a0eb93SMauro Carvalho Chehab store the state information into the skb:: 157c4a0eb93SMauro Carvalho Chehab 158c4a0eb93SMauro Carvalho Chehab sp = secpath_set(skb); 159c4a0eb93SMauro Carvalho Chehab if (!sp) return; 160c4a0eb93SMauro Carvalho Chehab sp->xvec[sp->len++] = xs; 161c4a0eb93SMauro Carvalho Chehab sp->olen++; 162c4a0eb93SMauro Carvalho Chehab 163c4a0eb93SMauro Carvalho Chehab indicate the success and/or error status of the offload:: 164c4a0eb93SMauro Carvalho Chehab 165c4a0eb93SMauro Carvalho Chehab xo = xfrm_offload(skb); 166c4a0eb93SMauro Carvalho Chehab xo->flags = CRYPTO_DONE; 167c4a0eb93SMauro Carvalho Chehab xo->status = crypto_status; 168c4a0eb93SMauro Carvalho Chehab 169c4a0eb93SMauro Carvalho Chehab hand the packet to napi_gro_receive() as usual 170c4a0eb93SMauro Carvalho Chehab 171c4a0eb93SMauro Carvalho ChehabIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn(). 172c4a0eb93SMauro Carvalho ChehabDriver will check packet seq number and update HW ESN state machine if needed. 173c4a0eb93SMauro Carvalho Chehab 1742b7c72e0SLeon RomanovskyPacket offload mode: 1752b7c72e0SLeon RomanovskyHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW 1762b7c72e0SLeon Romanovskyreported success. In TX path, the packet lefts kernel without extra header 1772b7c72e0SLeon Romanovskyand not encrypted, the HW is responsible to perform it. 1782b7c72e0SLeon Romanovsky 179c4a0eb93SMauro Carvalho ChehabWhen the SA is removed by the user, the driver's xdo_dev_state_delete() 1802b7c72e0SLeon Romanovskyand xdo_dev_policy_delete() are asked to disable the offload. Later, 1812b7c72e0SLeon Romanovskyxdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage 1822b7c72e0SLeon Romanovskycollection routine after all reference counts to the state and policy 183c4a0eb93SMauro Carvalho Chehabhave been removed and any remaining resources can be cleared for the 184c4a0eb93SMauro Carvalho Chehaboffload state. How these are used by the driver will depend on specific 185c4a0eb93SMauro Carvalho Chehabhardware needs. 186c4a0eb93SMauro Carvalho Chehab 187c4a0eb93SMauro Carvalho ChehabAs a netdev is set to DOWN the XFRM stack's netdev listener will call 1882b7c72e0SLeon Romanovskyxdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and 1892b7c72e0SLeon Romanovskyxdo_dev_policy_free() on any remaining offloaded states. 1902b7c72e0SLeon Romanovsky 1912b7c72e0SLeon RomanovskyOutcome of HW handling packets, the XFRM core can't count hard, soft limits. 1922b7c72e0SLeon RomanovskyThe HW/driver are responsible to perform it and provide accurate data when 1932b7c72e0SLeon Romanovskyxdo_dev_state_update_curlft() is called. In case of one of these limits 1942b7c72e0SLeon Romanovskyoccuried, the driver needs to call to xfrm_state_check_expire() to make sure 1952b7c72e0SLeon Romanovskythat XFRM performs rekeying sequence. 196