History log of /openbmc/qemu/net/af-xdp.c (Results 1 – 6 of 6)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v9.2.0, v9.1.2, v9.1.1, v9.1.0
# 0cf74ffe 25-Mar-2024 Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-target-arm-20240325-1' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* Fixes for seven minor coverity issues

# -----BEGIN PGP SIGNATURE-----
#
#

Merge tag 'pull-target-arm-20240325-1' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* Fixes for seven minor coverity issues

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmYBh5wZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3lb8D/9XDbRFB3kIHVBaDxZyE4bs
# QH8u80C08f/PzJ5SQos5D+R07xtPid1dyeiLND/RvwZUN3WAGKf9pmPUQL4aluz5
# gHMalq/+nGNam2qz+tKTI0q0otndiJrGNlOYhw2QqFJ9GUp2T9e61izgw0XeQtzF
# GKm6aE8LytH7h2H9ndIpJFQDggqkQev/uZ625hwhYxo0ND5uRqBNE7Wjy104DULo
# oEGZBhIB2CtyDiQdxgCfC8TOXVT3NAEbk6carbYdGshrMTpWNsjOHbLVcsuqUaZC
# eeRnOprsQq+YE5aAByfipGgCuoGNE5rn6ZTrDpSdfLe8LFfU/hEASnOmIjMtMbSM
# HKhKcKKzvLk/KQZZNJCbh+MKl1GsTvXMrB/DjLaVu2643MyQY7XZu3/XX3PE6Zee
# WqJC+NazfXCdHDyYqfPELkmnpeS5Tka/PCoku1VNWmnr7Qr6SYIqzbxI+zCsbDCs
# uqDfxzwN1lTKCkgUD3SVQrmrQ3u9nTLCpTqmaEd6H3+0UgpEUBpW51bMPUxO3KIk
# ouvjVJ3oDSdNMyVrEl3zDoxykU99trRYbIRALrW+rd1ghn4SE0WorAGJ96GLGYP0
# QfFtveTmDqsfKOvxHfBx6gng0aQw0GK145uXLciRaPuX51wZGbAjp/Muhs6oswtR
# j7GgfYAbVdc1QwKTqBK0tw==
# =0H37
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 25 Mar 2024 14:18:04 GMT
# gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg: issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# gpg: aka "Peter Maydell <peter@archaic.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20240325-1' of https://git.linaro.org/people/pmaydell/qemu-arm:
tests/qtest/libqtest.c: Check for g_setenv() failure
tests/unit/test-throttle: Avoid unintended integer division
hw/nvram/mac_nvram: Report failure to write data
hw/misc/pca9554: Correct error check bounds in get/set pin functions
net/af-xdp.c: Don't leak sock_fds array in net_init_af_xdp()
tests/unit/socket-helpers: Don't close(-1)
tests/qtest/npcm7xx_emc_test: Don't leak cmd_line

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


# bed150be 25-Mar-2024 Peter Maydell <peter.maydell@linaro.org>

net/af-xdp.c: Don't leak sock_fds array in net_init_af_xdp()

In net_init_af_xdp() we parse the arguments and allocate
a buffer of ints into sock_fds. However, although we
free this in the error exit

net/af-xdp.c: Don't leak sock_fds array in net_init_af_xdp()

In net_init_af_xdp() we parse the arguments and allocate
a buffer of ints into sock_fds. However, although we
free this in the error exit path, we don't ever free it
in the successful return path. Coverity spots this leak.

Switch to g_autofree so we don't need to manually free the
array.

Resolves: Coverity CID 1534906
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20240312183810.557768-4-peter.maydell@linaro.org

show more ...


# 14639717 31-Jan-2024 Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging

trivial patches for 2024-01-31

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmW6NSc

Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into staging

trivial patches for 2024-01-31

# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmW6NScPHG1qdEB0bHMu
# bXNrLnJ1AAoJEHAbT2saaT5ZdQYH/2fhfhZotH0V2qAcMxlOoHbAE9UhZNRsSYtf
# QFP0GXFYFAMm7LHkPUbvKgO7LylKWAOMn/zKZqgj1Vf1EpoKQ2FwLtR/buDz86Ec
# pi2OrDPRA7Ay5c3ow3YZZkUOhQTTcR5rNjYctPtt/J4j8ol/z5vre7weJIg2bCJe
# zI7vIVg7iFFzbkXY20KHngJ5nDC+aEm7WaGlxAP8kfkvy324Wy9O2k8qu2J5zbLT
# HGvh3rwEDvRTYe4CaKFFHWNV0m4092HAr/dJBobugI5VZ6QQpK6Tgy8N+4ZrCHD2
# SjUKeym85VTOYGuY8b18fk5MQK2SzsfBUJ4x8VGC75W4mJ8agdc=
# =HImO
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 31 Jan 2024 11:55:19 GMT
# gpg: using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59
# gpg: issuer "mjt@tls.msk.ru"
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full]
# gpg: aka "Michael Tokarev <mjt@corpit.ru>" [full]
# gpg: aka "Michael Tokarev <mjt@debian.org>" [full]
# Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5
# Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59

* tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu: (21 commits)
hw/hyperv: Include missing headers
hw/intc/xics: Include missing 'cpu.h' header
hw/arm: Add `\n` to hint message
hw/loongarch: Add `\n` to hint message
hw/i386: Add `\n` to hint message
backends/hostmem: Fix block comments style (checkpatch.pl warnings)
misc: Clean up includes
riscv: Clean up includes
cxl: Clean up includes
include: Clean up includes
m68k: Clean up includes
acpi: Clean up includes
aspeed: Clean up includes
disas/riscv: Clean up includes
hyperv: Clean up includes
scripts/clean-includes: Update exclude list
mailmap: Fix Stefan Weil email
qemu-docs: Update options for graphical frontends
qapi/migration.json: Fix the member name for MigrationCapability
colo: examples: remove mentions of script= and (wrong) downscript=
...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

show more ...


# 493bc2db 25-Jan-2024 Peter Maydell <peter.maydell@linaro.org>

misc: Clean up includes

This commit was created with scripts/clean-includes:
./scripts/clean-includes --git misc net/af-xdp.c plugins/*.c audio/pwaudio.c util/userfaultfd.c

All .c should include q

misc: Clean up includes

This commit was created with scripts/clean-includes:
./scripts/clean-includes --git misc net/af-xdp.c plugins/*.c audio/pwaudio.c util/userfaultfd.c

All .c should include qemu/osdep.h first. The script performs three
related cleanups:

* Ensure .c files include qemu/osdep.h first.
* Including it in a .h is redundant, since the .c already includes
it. Drop such inclusions.
* Likewise, including headers qemu/osdep.h includes is redundant.
Drop these, too.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>

show more ...


# dd0c8498 19-Sep-2023 Stefan Hajnoczi <stefanha@redhat.com>

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# -----BEGIN PGP SIGNATURE-----
# Version: GnuPG v1
#
# iQEcBAABAgAGBQJlB/SLAAoJEO8Ells5jWIR7EQH/1kAbxHcSGJXDOgQAXJ/rOZi

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# -----BEGIN PGP SIGNATURE-----
# Version: GnuPG v1
#
# iQEcBAABAgAGBQJlB/SLAAoJEO8Ells5jWIR7EQH/1kAbxHcSGJXDOgQAXJ/rOZi
# UKn3ugJzD0Hxd4Xz8cvdVLM+9/JoEEOK1uB+NIG7Ask/gA5D7eUYzaLtp1OJ8VNO
# mamfKmn3EIBWJoLSHH19TKzfW2tGMJHQ0Nj+sbDQRkK5f2c7hwLTRXa1EmlJd4dB
# VoVzX4OiJtrQyv4OVmpP/PSETXJDvYYX/DNcRl9/3ccKtQW/wVDI3YzrMzXrsgyc
# w9ItJi8k+19mVH6RgQwciqRvTbVMdzkOxqvU//LY0TxnjsHfbyHr+KlNAa2WTY2N
# QgpAlMZhHqUG6/XXAs0o2VEtA66zmw932Xfy/CZUEcdGWfkG/9CEVfbuT4CKGY4=
# =tF7K
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 18 Sep 2023 02:56:11 EDT
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [full]
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211

* tag 'net-pull-request' of https://github.com/jasowang/qemu:
net/tap: Avoid variable-length array
net/dump: Avoid variable length array
hw/net/rocker: Avoid variable length array
hw/net/fsl_etsec/rings.c: Avoid variable length array
net: add initial support for AF_XDP network backend
tests: bump libvirt-ci for libasan and libxdp
e1000e: rename e1000e_ba_state and e1000e_write_hdr_to_rx_buffers
igb: packet-split descriptors support
igb: add IPv6 extended headers traffic detection
igb: RX payload guest writting refactoring
igb: RX descriptors guest writting refactoring
igb: rename E1000E_RingInfo_st
igb: remove TCP ACK detection
virtio-net: Add support for USO features
virtio-net: Add USO flags to vhost support.
tap: Add check for USO features
tap: Add USO support to tap device.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

show more ...


# cb039ef3 13-Sep-2023 Ilya Maximets <i.maximets@ovn.org>

net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the ke

net: add initial support for AF_XDP network backend

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack. In the essence, the technology is
pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets. Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket. A chunk of userspace memory
is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data. Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring. After transmission, device will
return the buffer via Completion ring. On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

-device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
-netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface. It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
driver support. With a caveat of lower performance.

2. native - this does require support from the driver and allows to
bypass skb allocation in the kernel and potentially use
zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option. To force 'copy' even in native
mode, use 'force-copy=on' option. This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open. Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU. So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues. It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs. See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
or CAP_BPF capabilities in order to load default XSK/XDP programs to
the network interface and configure BPF maps. It is possible, however,
to run with no capabilities. For that to work, an external process
with enough capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option. Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.

There are few performance challenges with the current network backends.

First is that they do not support IO threads. This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work. This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance. The fastest
"frontend" device is virtio-net. But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa). In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory. Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations. AF_XDP sockets do not
support any kinds of checksum or segmentation offloading. Buffers
are limited to a page size (4K), i.e. MTU is limited. Multi-buffer
support implementation for AF_XDP is in progress, but not ready yet.
Also, transmission in all non-zero-copy modes is synchronous, i.e.
done in a syscall. That doesn't allow high packet rates on virtual
interfaces.

However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on top
of a physical NIC with zero-copy support.

Test setup:

2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
Network backend is configured to open the NIC directly in native mode.
The driver supports zero-copy. NIC is configured to use 1 queue.

Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
for PPS testing.

iperf3 result:
TCP stream : 19.1 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 3.4 Mpps
Rx only : 2.0 Mpps
L2 FWD Loopback : 1.5 Mpps

In skb mode the same setup shows much lower performance, similar to
the setup where pair of physical NICs is replaced with veth pair:

iperf3 result:
TCP stream : 9 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
Tx only : 1.2 Mpps
Rx only : 1.0 Mpps
L2 FWD Loopback : 0.7 Mpps

Results in skb mode or over the veth are close to results of a tap
backend with vhost=on and disabled segmentation offloading bridged
with a NIC.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
Signed-off-by: Jason Wang <jasowang@redhat.com>

show more ...