197162a1eSMauro Carvalho Chehab======================
297162a1eSMauro Carvalho ChehabUserspace verbs access
397162a1eSMauro Carvalho Chehab======================
497162a1eSMauro Carvalho Chehab
597162a1eSMauro Carvalho Chehab  The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
697162a1eSMauro Carvalho Chehab  enables direct userspace access to IB hardware via "verbs," as
797162a1eSMauro Carvalho Chehab  described in chapter 11 of the InfiniBand Architecture Specification.
897162a1eSMauro Carvalho Chehab
997162a1eSMauro Carvalho Chehab  To use the verbs, the libibverbs library, available from
1097162a1eSMauro Carvalho Chehab  https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
1197162a1eSMauro Carvalho Chehab  device-independent API for using the ib_uverbs interface.
1297162a1eSMauro Carvalho Chehab  libibverbs also requires appropriate device-dependent kernel and
1397162a1eSMauro Carvalho Chehab  userspace driver for your InfiniBand hardware.  For example, to use
1497162a1eSMauro Carvalho Chehab  a Mellanox HCA, you will need the ib_mthca kernel module and the
1597162a1eSMauro Carvalho Chehab  libmthca userspace driver be installed.
1697162a1eSMauro Carvalho Chehab
1797162a1eSMauro Carvalho ChehabUser-kernel communication
1897162a1eSMauro Carvalho Chehab=========================
1997162a1eSMauro Carvalho Chehab
2097162a1eSMauro Carvalho Chehab  Userspace communicates with the kernel for slow path, resource
2197162a1eSMauro Carvalho Chehab  management operations via the /dev/infiniband/uverbsN character
2297162a1eSMauro Carvalho Chehab  devices.  Fast path operations are typically performed by writing
2397162a1eSMauro Carvalho Chehab  directly to hardware registers mmap()ed into userspace, with no
2497162a1eSMauro Carvalho Chehab  system call or context switch into the kernel.
2597162a1eSMauro Carvalho Chehab
2697162a1eSMauro Carvalho Chehab  Commands are sent to the kernel via write()s on these device files.
2797162a1eSMauro Carvalho Chehab  The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
2897162a1eSMauro Carvalho Chehab  The structs for commands that require a response from the kernel
2997162a1eSMauro Carvalho Chehab  contain a 64-bit field used to pass a pointer to an output buffer.
3097162a1eSMauro Carvalho Chehab  Status is returned to userspace as the return value of the write()
3197162a1eSMauro Carvalho Chehab  system call.
3297162a1eSMauro Carvalho Chehab
3397162a1eSMauro Carvalho ChehabResource management
3497162a1eSMauro Carvalho Chehab===================
3597162a1eSMauro Carvalho Chehab
3697162a1eSMauro Carvalho Chehab  Since creation and destruction of all IB resources is done by
3797162a1eSMauro Carvalho Chehab  commands passed through a file descriptor, the kernel can keep track
3897162a1eSMauro Carvalho Chehab  of which resources are attached to a given userspace context.  The
3997162a1eSMauro Carvalho Chehab  ib_uverbs module maintains idr tables that are used to translate
4097162a1eSMauro Carvalho Chehab  between kernel pointers and opaque userspace handles, so that kernel
4197162a1eSMauro Carvalho Chehab  pointers are never exposed to userspace and userspace cannot trick
4297162a1eSMauro Carvalho Chehab  the kernel into following a bogus pointer.
4397162a1eSMauro Carvalho Chehab
4497162a1eSMauro Carvalho Chehab  This also allows the kernel to clean up when a process exits and
4597162a1eSMauro Carvalho Chehab  prevent one process from touching another process's resources.
4697162a1eSMauro Carvalho Chehab
4797162a1eSMauro Carvalho ChehabMemory pinning
4897162a1eSMauro Carvalho Chehab==============
4997162a1eSMauro Carvalho Chehab
5097162a1eSMauro Carvalho Chehab  Direct userspace I/O requires that memory regions that are potential
5197162a1eSMauro Carvalho Chehab  I/O targets be kept resident at the same physical address.  The
5297162a1eSMauro Carvalho Chehab  ib_uverbs module manages pinning and unpinning memory regions via
5397162a1eSMauro Carvalho Chehab  get_user_pages() and put_page() calls.  It also accounts for the
5497162a1eSMauro Carvalho Chehab  amount of memory pinned in the process's pinned_vm, and checks that
5597162a1eSMauro Carvalho Chehab  unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
5697162a1eSMauro Carvalho Chehab
5797162a1eSMauro Carvalho Chehab  Pages that are pinned multiple times are counted each time they are
5897162a1eSMauro Carvalho Chehab  pinned, so the value of pinned_vm may be an overestimate of the
5997162a1eSMauro Carvalho Chehab  number of pages pinned by a process.
6097162a1eSMauro Carvalho Chehab
6197162a1eSMauro Carvalho Chehab/dev files
6297162a1eSMauro Carvalho Chehab==========
6397162a1eSMauro Carvalho Chehab
6497162a1eSMauro Carvalho Chehab  To create the appropriate character device files automatically with
6597162a1eSMauro Carvalho Chehab  udev, a rule like::
6697162a1eSMauro Carvalho Chehab
6797162a1eSMauro Carvalho Chehab    KERNEL=="uverbs*", NAME="infiniband/%k"
6897162a1eSMauro Carvalho Chehab
6997162a1eSMauro Carvalho Chehab  can be used.  This will create device nodes named::
7097162a1eSMauro Carvalho Chehab
7197162a1eSMauro Carvalho Chehab    /dev/infiniband/uverbs0
7297162a1eSMauro Carvalho Chehab
7397162a1eSMauro Carvalho Chehab  and so on.  Since the InfiniBand userspace verbs should be safe for
7497162a1eSMauro Carvalho Chehab  use by non-privileged processes, it may be useful to add an
7597162a1eSMauro Carvalho Chehab  appropriate MODE or GROUP to the udev rule.
76