197162a1eSMauro Carvalho Chehab====================== 297162a1eSMauro Carvalho ChehabUserspace verbs access 397162a1eSMauro Carvalho Chehab====================== 497162a1eSMauro Carvalho Chehab 597162a1eSMauro Carvalho Chehab The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, 697162a1eSMauro Carvalho Chehab enables direct userspace access to IB hardware via "verbs," as 797162a1eSMauro Carvalho Chehab described in chapter 11 of the InfiniBand Architecture Specification. 897162a1eSMauro Carvalho Chehab 997162a1eSMauro Carvalho Chehab To use the verbs, the libibverbs library, available from 1097162a1eSMauro Carvalho Chehab https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a 1197162a1eSMauro Carvalho Chehab device-independent API for using the ib_uverbs interface. 1297162a1eSMauro Carvalho Chehab libibverbs also requires appropriate device-dependent kernel and 1397162a1eSMauro Carvalho Chehab userspace driver for your InfiniBand hardware. For example, to use 1497162a1eSMauro Carvalho Chehab a Mellanox HCA, you will need the ib_mthca kernel module and the 1597162a1eSMauro Carvalho Chehab libmthca userspace driver be installed. 1697162a1eSMauro Carvalho Chehab 1797162a1eSMauro Carvalho ChehabUser-kernel communication 1897162a1eSMauro Carvalho Chehab========================= 1997162a1eSMauro Carvalho Chehab 2097162a1eSMauro Carvalho Chehab Userspace communicates with the kernel for slow path, resource 2197162a1eSMauro Carvalho Chehab management operations via the /dev/infiniband/uverbsN character 2297162a1eSMauro Carvalho Chehab devices. Fast path operations are typically performed by writing 2397162a1eSMauro Carvalho Chehab directly to hardware registers mmap()ed into userspace, with no 2497162a1eSMauro Carvalho Chehab system call or context switch into the kernel. 2597162a1eSMauro Carvalho Chehab 2697162a1eSMauro Carvalho Chehab Commands are sent to the kernel via write()s on these device files. 2797162a1eSMauro Carvalho Chehab The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. 2897162a1eSMauro Carvalho Chehab The structs for commands that require a response from the kernel 2997162a1eSMauro Carvalho Chehab contain a 64-bit field used to pass a pointer to an output buffer. 3097162a1eSMauro Carvalho Chehab Status is returned to userspace as the return value of the write() 3197162a1eSMauro Carvalho Chehab system call. 3297162a1eSMauro Carvalho Chehab 3397162a1eSMauro Carvalho ChehabResource management 3497162a1eSMauro Carvalho Chehab=================== 3597162a1eSMauro Carvalho Chehab 3697162a1eSMauro Carvalho Chehab Since creation and destruction of all IB resources is done by 3797162a1eSMauro Carvalho Chehab commands passed through a file descriptor, the kernel can keep track 3897162a1eSMauro Carvalho Chehab of which resources are attached to a given userspace context. The 3997162a1eSMauro Carvalho Chehab ib_uverbs module maintains idr tables that are used to translate 4097162a1eSMauro Carvalho Chehab between kernel pointers and opaque userspace handles, so that kernel 4197162a1eSMauro Carvalho Chehab pointers are never exposed to userspace and userspace cannot trick 4297162a1eSMauro Carvalho Chehab the kernel into following a bogus pointer. 4397162a1eSMauro Carvalho Chehab 4497162a1eSMauro Carvalho Chehab This also allows the kernel to clean up when a process exits and 4597162a1eSMauro Carvalho Chehab prevent one process from touching another process's resources. 4697162a1eSMauro Carvalho Chehab 4797162a1eSMauro Carvalho ChehabMemory pinning 4897162a1eSMauro Carvalho Chehab============== 4997162a1eSMauro Carvalho Chehab 5097162a1eSMauro Carvalho Chehab Direct userspace I/O requires that memory regions that are potential 5197162a1eSMauro Carvalho Chehab I/O targets be kept resident at the same physical address. The 5297162a1eSMauro Carvalho Chehab ib_uverbs module manages pinning and unpinning memory regions via 5397162a1eSMauro Carvalho Chehab get_user_pages() and put_page() calls. It also accounts for the 5497162a1eSMauro Carvalho Chehab amount of memory pinned in the process's pinned_vm, and checks that 5597162a1eSMauro Carvalho Chehab unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. 5697162a1eSMauro Carvalho Chehab 5797162a1eSMauro Carvalho Chehab Pages that are pinned multiple times are counted each time they are 5897162a1eSMauro Carvalho Chehab pinned, so the value of pinned_vm may be an overestimate of the 5997162a1eSMauro Carvalho Chehab number of pages pinned by a process. 6097162a1eSMauro Carvalho Chehab 6197162a1eSMauro Carvalho Chehab/dev files 6297162a1eSMauro Carvalho Chehab========== 6397162a1eSMauro Carvalho Chehab 6497162a1eSMauro Carvalho Chehab To create the appropriate character device files automatically with 6597162a1eSMauro Carvalho Chehab udev, a rule like:: 6697162a1eSMauro Carvalho Chehab 6797162a1eSMauro Carvalho Chehab KERNEL=="uverbs*", NAME="infiniband/%k" 6897162a1eSMauro Carvalho Chehab 6997162a1eSMauro Carvalho Chehab can be used. This will create device nodes named:: 7097162a1eSMauro Carvalho Chehab 7197162a1eSMauro Carvalho Chehab /dev/infiniband/uverbs0 7297162a1eSMauro Carvalho Chehab 7397162a1eSMauro Carvalho Chehab and so on. Since the InfiniBand userspace verbs should be safe for 7497162a1eSMauro Carvalho Chehab use by non-privileged processes, it may be useful to add an 7597162a1eSMauro Carvalho Chehab appropriate MODE or GROUP to the udev rule. 76