1c4a0eb93SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2*6c8f7c43SRahul Rameshbabu.. _xfrm_device: 3c4a0eb93SMauro Carvalho Chehab 4c4a0eb93SMauro Carvalho Chehab=============================================== 5c4a0eb93SMauro Carvalho ChehabXFRM device - offloading the IPsec computations 6c4a0eb93SMauro Carvalho Chehab=============================================== 7c4a0eb93SMauro Carvalho Chehab 8c4a0eb93SMauro Carvalho ChehabShannon Nelson <shannon.nelson@oracle.com> 92b7c72e0SLeon RomanovskyLeon Romanovsky <leonro@nvidia.com> 10c4a0eb93SMauro Carvalho Chehab 11c4a0eb93SMauro Carvalho Chehab 12c4a0eb93SMauro Carvalho ChehabOverview 13c4a0eb93SMauro Carvalho Chehab======== 14c4a0eb93SMauro Carvalho Chehab 15c4a0eb93SMauro Carvalho ChehabIPsec is a useful feature for securing network traffic, but the 16c4a0eb93SMauro Carvalho Chehabcomputational cost is high: a 10Gbps link can easily be brought down 17c4a0eb93SMauro Carvalho Chehabto under 1Gbps, depending on the traffic and link configuration. 18c4a0eb93SMauro Carvalho ChehabLuckily, there are NICs that offer a hardware based IPsec offload which 19c4a0eb93SMauro Carvalho Chehabcan radically increase throughput and decrease CPU utilization. The XFRM 20c4a0eb93SMauro Carvalho ChehabDevice interface allows NIC drivers to offer to the stack access to the 21c4a0eb93SMauro Carvalho Chehabhardware offload. 22c4a0eb93SMauro Carvalho Chehab 232b7c72e0SLeon RomanovskyRight now, there are two types of hardware offload that kernel supports. 242b7c72e0SLeon Romanovsky * IPsec crypto offload: 252b7c72e0SLeon Romanovsky * NIC performs encrypt/decrypt 262b7c72e0SLeon Romanovsky * Kernel does everything else 272b7c72e0SLeon Romanovsky * IPsec packet offload: 282b7c72e0SLeon Romanovsky * NIC performs encrypt/decrypt 292b7c72e0SLeon Romanovsky * NIC does encapsulation 302b7c72e0SLeon Romanovsky * Kernel and NIC have SA and policy in-sync 312b7c72e0SLeon Romanovsky * NIC handles the SA and policies states 322b7c72e0SLeon Romanovsky * The Kernel talks to the keymanager 332b7c72e0SLeon Romanovsky 34c4a0eb93SMauro Carvalho ChehabUserland access to the offload is typically through a system such as 35c4a0eb93SMauro Carvalho Chehablibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can 36c4a0eb93SMauro Carvalho Chehabbe handy when experimenting. An example command might look something 372b7c72e0SLeon Romanovskylike this for crypto offload: 38c4a0eb93SMauro Carvalho Chehab 39c4a0eb93SMauro Carvalho Chehab ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 40c4a0eb93SMauro Carvalho Chehab reqid 0x07 replay-window 32 \ 41c4a0eb93SMauro Carvalho Chehab aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 42c4a0eb93SMauro Carvalho Chehab sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 43c4a0eb93SMauro Carvalho Chehab offload dev eth4 dir in 44c4a0eb93SMauro Carvalho Chehab 452b7c72e0SLeon Romanovskyand for packet offload 462b7c72e0SLeon Romanovsky 472b7c72e0SLeon Romanovsky ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 482b7c72e0SLeon Romanovsky reqid 0x07 replay-window 32 \ 492b7c72e0SLeon Romanovsky aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 502b7c72e0SLeon Romanovsky sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 512b7c72e0SLeon Romanovsky offload packet dev eth4 dir in 522b7c72e0SLeon Romanovsky 532b7c72e0SLeon Romanovsky ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in 542b7c72e0SLeon Romanovsky tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport 552b7c72e0SLeon Romanovsky 56c4a0eb93SMauro Carvalho ChehabYes, that's ugly, but that's what shell scripts and/or libreswan are for. 57c4a0eb93SMauro Carvalho Chehab 58c4a0eb93SMauro Carvalho Chehab 59c4a0eb93SMauro Carvalho Chehab 60c4a0eb93SMauro Carvalho ChehabCallbacks to implement 61c4a0eb93SMauro Carvalho Chehab====================== 62c4a0eb93SMauro Carvalho Chehab 63c4a0eb93SMauro Carvalho Chehab:: 64c4a0eb93SMauro Carvalho Chehab 65c4a0eb93SMauro Carvalho Chehab /* from include/linux/netdevice.h */ 66c4a0eb93SMauro Carvalho Chehab struct xfrmdev_ops { 672b7c72e0SLeon Romanovsky /* Crypto and Packet offload callbacks */ 687681a4f5SLeon Romanovsky int (*xdo_dev_state_add) (struct xfrm_state *x, struct netlink_ext_ack *extack); 69c4a0eb93SMauro Carvalho Chehab void (*xdo_dev_state_delete) (struct xfrm_state *x); 70c4a0eb93SMauro Carvalho Chehab void (*xdo_dev_state_free) (struct xfrm_state *x); 71c4a0eb93SMauro Carvalho Chehab bool (*xdo_dev_offload_ok) (struct sk_buff *skb, 72c4a0eb93SMauro Carvalho Chehab struct xfrm_state *x); 73c4a0eb93SMauro Carvalho Chehab void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); 742b7c72e0SLeon Romanovsky 752b7c72e0SLeon Romanovsky /* Solely packet offload callbacks */ 762b7c72e0SLeon Romanovsky void (*xdo_dev_state_update_curlft) (struct xfrm_state *x); 773089386dSLeon Romanovsky int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); 782b7c72e0SLeon Romanovsky void (*xdo_dev_policy_delete) (struct xfrm_policy *x); 792b7c72e0SLeon Romanovsky void (*xdo_dev_policy_free) (struct xfrm_policy *x); 80c4a0eb93SMauro Carvalho Chehab }; 81c4a0eb93SMauro Carvalho Chehab 822b7c72e0SLeon RomanovskyThe NIC driver offering ipsec offload will need to implement callbacks 832b7c72e0SLeon Romanovskyrelevant to supported offload to make the offload available to the network 842b7c72e0SLeon Romanovskystack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and 85c4a0eb93SMauro Carvalho ChehabNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload. 86c4a0eb93SMauro Carvalho Chehab 87c4a0eb93SMauro Carvalho Chehab 88c4a0eb93SMauro Carvalho Chehab 89c4a0eb93SMauro Carvalho ChehabFlow 90c4a0eb93SMauro Carvalho Chehab==== 91c4a0eb93SMauro Carvalho Chehab 92c4a0eb93SMauro Carvalho ChehabAt probe time and before the call to register_netdev(), the driver should 93c4a0eb93SMauro Carvalho Chehabset up local data structures and XFRM callbacks, and set the feature bits. 94c4a0eb93SMauro Carvalho ChehabThe XFRM code's listener will finish the setup on NETDEV_REGISTER. 95c4a0eb93SMauro Carvalho Chehab 96c4a0eb93SMauro Carvalho Chehab:: 97c4a0eb93SMauro Carvalho Chehab 98c4a0eb93SMauro Carvalho Chehab adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops; 99c4a0eb93SMauro Carvalho Chehab adapter->netdev->features |= NETIF_F_HW_ESP; 100c4a0eb93SMauro Carvalho Chehab adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP; 101c4a0eb93SMauro Carvalho Chehab 102c4a0eb93SMauro Carvalho ChehabWhen new SAs are set up with a request for "offload" feature, the 103c4a0eb93SMauro Carvalho Chehabdriver's xdo_dev_state_add() will be given the new SA to be offloaded 104c4a0eb93SMauro Carvalho Chehaband an indication of whether it is for Rx or Tx. The driver should 105c4a0eb93SMauro Carvalho Chehab 106c4a0eb93SMauro Carvalho Chehab - verify the algorithm is supported for offloads 107c4a0eb93SMauro Carvalho Chehab - store the SA information (key, salt, target-ip, protocol, etc) 108c4a0eb93SMauro Carvalho Chehab - enable the HW offload of the SA 109c4a0eb93SMauro Carvalho Chehab - return status value: 110c4a0eb93SMauro Carvalho Chehab 111c4a0eb93SMauro Carvalho Chehab =========== =================================== 112c4a0eb93SMauro Carvalho Chehab 0 success 1132b7c72e0SLeon Romanovsky -EOPNETSUPP offload not supported, try SW IPsec, 1142b7c72e0SLeon Romanovsky not applicable for packet offload mode 115c4a0eb93SMauro Carvalho Chehab other fail the request 116c4a0eb93SMauro Carvalho Chehab =========== =================================== 117c4a0eb93SMauro Carvalho Chehab 118c4a0eb93SMauro Carvalho ChehabThe driver can also set an offload_handle in the SA, an opaque void pointer 119c4a0eb93SMauro Carvalho Chehabthat can be used to convey context into the fast-path offload requests:: 120c4a0eb93SMauro Carvalho Chehab 121c4a0eb93SMauro Carvalho Chehab xs->xso.offload_handle = context; 122c4a0eb93SMauro Carvalho Chehab 123c4a0eb93SMauro Carvalho Chehab 124c4a0eb93SMauro Carvalho ChehabWhen the network stack is preparing an IPsec packet for an SA that has 125c4a0eb93SMauro Carvalho Chehabbeen setup for offload, it first calls into xdo_dev_offload_ok() with 126c4a0eb93SMauro Carvalho Chehabthe skb and the intended offload state to ask the driver if the offload 127c4a0eb93SMauro Carvalho Chehabwill serviceable. This can check the packet information to be sure the 128c4a0eb93SMauro Carvalho Chehaboffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and 129c4a0eb93SMauro Carvalho Chehabreturn true of false to signify its support. 130c4a0eb93SMauro Carvalho Chehab 1312b7c72e0SLeon RomanovskyCrypto offload mode: 132c4a0eb93SMauro Carvalho ChehabWhen ready to send, the driver needs to inspect the Tx packet for the 133c4a0eb93SMauro Carvalho Chehaboffload information, including the opaque context, and set up the packet 134c4a0eb93SMauro Carvalho Chehabsend accordingly:: 135c4a0eb93SMauro Carvalho Chehab 136c4a0eb93SMauro Carvalho Chehab xs = xfrm_input_state(skb); 137c4a0eb93SMauro Carvalho Chehab context = xs->xso.offload_handle; 138c4a0eb93SMauro Carvalho Chehab set up HW for send 139c4a0eb93SMauro Carvalho Chehab 140c4a0eb93SMauro Carvalho ChehabThe stack has already inserted the appropriate IPsec headers in the 141c4a0eb93SMauro Carvalho Chehabpacket data, the offload just needs to do the encryption and fix up the 142c4a0eb93SMauro Carvalho Chehabheader values. 143c4a0eb93SMauro Carvalho Chehab 144c4a0eb93SMauro Carvalho Chehab 145c4a0eb93SMauro Carvalho ChehabWhen a packet is received and the HW has indicated that it offloaded a 146c4a0eb93SMauro Carvalho Chehabdecryption, the driver needs to add a reference to the decoded SA into 147c4a0eb93SMauro Carvalho Chehabthe packet's skb. At this point the data should be decrypted but the 148c4a0eb93SMauro Carvalho ChehabIPsec headers are still in the packet data; they are removed later up 149c4a0eb93SMauro Carvalho Chehabthe stack in xfrm_input(). 150c4a0eb93SMauro Carvalho Chehab 151c4a0eb93SMauro Carvalho Chehab find and hold the SA that was used to the Rx skb:: 152c4a0eb93SMauro Carvalho Chehab 153c4a0eb93SMauro Carvalho Chehab get spi, protocol, and destination IP from packet headers 154c4a0eb93SMauro Carvalho Chehab xs = find xs from (spi, protocol, dest_IP) 155c4a0eb93SMauro Carvalho Chehab xfrm_state_hold(xs); 156c4a0eb93SMauro Carvalho Chehab 157c4a0eb93SMauro Carvalho Chehab store the state information into the skb:: 158c4a0eb93SMauro Carvalho Chehab 159c4a0eb93SMauro Carvalho Chehab sp = secpath_set(skb); 160c4a0eb93SMauro Carvalho Chehab if (!sp) return; 161c4a0eb93SMauro Carvalho Chehab sp->xvec[sp->len++] = xs; 162c4a0eb93SMauro Carvalho Chehab sp->olen++; 163c4a0eb93SMauro Carvalho Chehab 164c4a0eb93SMauro Carvalho Chehab indicate the success and/or error status of the offload:: 165c4a0eb93SMauro Carvalho Chehab 166c4a0eb93SMauro Carvalho Chehab xo = xfrm_offload(skb); 167c4a0eb93SMauro Carvalho Chehab xo->flags = CRYPTO_DONE; 168c4a0eb93SMauro Carvalho Chehab xo->status = crypto_status; 169c4a0eb93SMauro Carvalho Chehab 170c4a0eb93SMauro Carvalho Chehab hand the packet to napi_gro_receive() as usual 171c4a0eb93SMauro Carvalho Chehab 172c4a0eb93SMauro Carvalho ChehabIn ESN mode, xdo_dev_state_advance_esn() is called from xfrm_replay_advance_esn(). 173c4a0eb93SMauro Carvalho ChehabDriver will check packet seq number and update HW ESN state machine if needed. 174c4a0eb93SMauro Carvalho Chehab 1752b7c72e0SLeon RomanovskyPacket offload mode: 1762b7c72e0SLeon RomanovskyHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW 1772b7c72e0SLeon Romanovskyreported success. In TX path, the packet lefts kernel without extra header 1782b7c72e0SLeon Romanovskyand not encrypted, the HW is responsible to perform it. 1792b7c72e0SLeon Romanovsky 180c4a0eb93SMauro Carvalho ChehabWhen the SA is removed by the user, the driver's xdo_dev_state_delete() 1812b7c72e0SLeon Romanovskyand xdo_dev_policy_delete() are asked to disable the offload. Later, 1822b7c72e0SLeon Romanovskyxdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage 1832b7c72e0SLeon Romanovskycollection routine after all reference counts to the state and policy 184c4a0eb93SMauro Carvalho Chehabhave been removed and any remaining resources can be cleared for the 185c4a0eb93SMauro Carvalho Chehaboffload state. How these are used by the driver will depend on specific 186c4a0eb93SMauro Carvalho Chehabhardware needs. 187c4a0eb93SMauro Carvalho Chehab 188c4a0eb93SMauro Carvalho ChehabAs a netdev is set to DOWN the XFRM stack's netdev listener will call 1892b7c72e0SLeon Romanovskyxdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and 1902b7c72e0SLeon Romanovskyxdo_dev_policy_free() on any remaining offloaded states. 1912b7c72e0SLeon Romanovsky 1922b7c72e0SLeon RomanovskyOutcome of HW handling packets, the XFRM core can't count hard, soft limits. 1932b7c72e0SLeon RomanovskyThe HW/driver are responsible to perform it and provide accurate data when 1942b7c72e0SLeon Romanovskyxdo_dev_state_update_curlft() is called. In case of one of these limits 1952b7c72e0SLeon Romanovskyoccuried, the driver needs to call to xfrm_state_check_expire() to make sure 1962b7c72e0SLeon Romanovskythat XFRM performs rekeying sequence. 197