1.. include:: <isonum.txt>
2
3=====================
4VFIO Mediated devices
5=====================
6
7:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
8:Author: Neo Jia <cjia@nvidia.com>
9:Author: Kirti Wankhede <kwankhede@nvidia.com>
10
11This program is free software; you can redistribute it and/or modify
12it under the terms of the GNU General Public License version 2 as
13published by the Free Software Foundation.
14
15
16Virtual Function I/O (VFIO) Mediated devices[1]
17===============================================
18
19The number of use cases for virtualizing DMA devices that do not have built-in
20SR_IOV capability is increasing. Previously, to virtualize such devices,
21developers had to create their own management interfaces and APIs, and then
22integrate them with user space software. To simplify integration with user space
23software, we have identified common requirements and a unified management
24interface for such devices.
25
26The VFIO driver framework provides unified APIs for direct device access. It is
27an IOMMU/device-agnostic framework for exposing direct device access to user
28space in a secure, IOMMU-protected environment. This framework is used for
29multiple devices, such as GPUs, network adapters, and compute accelerators. With
30direct device access, virtual machines or user space applications have direct
31access to the physical device. This framework is reused for mediated devices.
32
33The mediated core driver provides a common interface for mediated device
34management that can be used by drivers of different devices. This module
35provides a generic interface to perform these operations:
36
37* Create and destroy a mediated device
38* Add a mediated device to and remove it from a mediated bus driver
39* Add a mediated device to and remove it from an IOMMU group
40
41The mediated core driver also provides an interface to register a bus driver.
42For example, the mediated VFIO mdev driver is designed for mediated devices and
43supports VFIO APIs. The mediated bus driver adds a mediated device to and
44removes it from a VFIO group.
45
46The following high-level block diagram shows the main components and interfaces
47in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
48devices as examples, as these devices are the first devices to use this module::
49
50     +---------------+
51     |               |
52     | +-----------+ |  mdev_register_driver() +--------------+
53     | |           | +<------------------------+              |
54     | |  mdev     | |                         |              |
55     | |  bus      | +------------------------>+ vfio_mdev.ko |<-> VFIO user
56     | |  driver   | |     probe()/remove()    |              |    APIs
57     | |           | |                         +--------------+
58     | +-----------+ |
59     |               |
60     |  MDEV CORE    |
61     |   MODULE      |
62     |   mdev.ko     |
63     | +-----------+ |  mdev_register_device() +--------------+
64     | |           | +<------------------------+              |
65     | |           | |                         |  nvidia.ko   |<-> physical
66     | |           | +------------------------>+              |    device
67     | |           | |        callbacks        +--------------+
68     | | Physical  | |
69     | |  device   | |  mdev_register_device() +--------------+
70     | | interface | |<------------------------+              |
71     | |           | |                         |  i915.ko     |<-> physical
72     | |           | +------------------------>+              |    device
73     | |           | |        callbacks        +--------------+
74     | |           | |
75     | |           | |  mdev_register_device() +--------------+
76     | |           | +<------------------------+              |
77     | |           | |                         | ccw_device.ko|<-> physical
78     | |           | +------------------------>+              |    device
79     | |           | |        callbacks        +--------------+
80     | +-----------+ |
81     +---------------+
82
83
84Registration Interfaces
85=======================
86
87The mediated core driver provides the following types of registration
88interfaces:
89
90* Registration interface for a mediated bus driver
91* Physical device driver interface
92
93Registration Interface for a Mediated Bus Driver
94------------------------------------------------
95
96The registration interface for a mediated bus driver provides the following
97structure to represent a mediated device's driver::
98
99     /*
100      * struct mdev_driver [2] - Mediated device's driver
101      * @probe: called when new device created
102      * @remove: called when device removed
103      * @driver: device driver structure
104      */
105     struct mdev_driver {
106	     int  (*probe)  (struct mdev_device *dev);
107	     void (*remove) (struct mdev_device *dev);
108	     struct device_driver    driver;
109     };
110
111A mediated bus driver for mdev should use this structure in the function calls
112to register and unregister itself with the core driver:
113
114* Register::
115
116    extern int  mdev_register_driver(struct mdev_driver *drv);
117
118* Unregister::
119
120    extern void mdev_unregister_driver(struct mdev_driver *drv);
121
122The mediated bus driver is responsible for adding mediated devices to the VFIO
123group when devices are bound to the driver and removing mediated devices from
124the VFIO when devices are unbound from the driver.
125
126
127Physical Device Driver Interface
128--------------------------------
129
130The physical device driver interface provides the mdev_parent_ops[3] structure
131to define the APIs to manage work in the mediated core driver that is related
132to the physical device.
133
134The structures in the mdev_parent_ops structure are as follows:
135
136* dev_attr_groups: attributes of the parent device
137* mdev_attr_groups: attributes of the mediated device
138* supported_config: attributes to define supported configurations
139
140The functions in the mdev_parent_ops structure are as follows:
141
142* create: allocate basic resources in a driver for a mediated device
143* remove: free resources in a driver when a mediated device is destroyed
144
145(Note that mdev-core provides no implicit serialization of create/remove
146callbacks per mdev parent device, per mdev type, or any other categorization.
147Vendor drivers are expected to be fully asynchronous in this respect or
148provide their own internal resource protection.)
149
150The callbacks in the mdev_parent_ops structure are as follows:
151
152* open: open callback of mediated device
153* close: close callback of mediated device
154* ioctl: ioctl callback of mediated device
155* read : read emulation callback
156* write: write emulation callback
157* mmap: mmap emulation callback
158
159A driver should use the mdev_parent_ops structure in the function call to
160register itself with the mdev core driver::
161
162	extern int  mdev_register_device(struct device *dev,
163	                                 const struct mdev_parent_ops *ops);
164
165However, the mdev_parent_ops structure is not required in the function call
166that a driver should use to unregister itself with the mdev core driver::
167
168	extern void mdev_unregister_device(struct device *dev);
169
170
171Mediated Device Management Interface Through sysfs
172==================================================
173
174The management interface through sysfs enables user space software, such as
175libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
176This management interface provides flexibility to the underlying physical
177device's driver to support features such as:
178
179* Mediated device hot plug
180* Multiple mediated devices in a single virtual machine
181* Multiple mediated devices from different physical devices
182
183Links in the mdev_bus Class Directory
184-------------------------------------
185The /sys/class/mdev_bus/ directory contains links to devices that are registered
186with the mdev core driver.
187
188Directories and files under the sysfs for Each Physical Device
189--------------------------------------------------------------
190
191::
192
193  |- [parent physical device]
194  |--- Vendor-specific-attributes [optional]
195  |--- [mdev_supported_types]
196  |     |--- [<type-id>]
197  |     |   |--- create
198  |     |   |--- name
199  |     |   |--- available_instances
200  |     |   |--- device_api
201  |     |   |--- description
202  |     |   |--- [devices]
203  |     |--- [<type-id>]
204  |     |   |--- create
205  |     |   |--- name
206  |     |   |--- available_instances
207  |     |   |--- device_api
208  |     |   |--- description
209  |     |   |--- [devices]
210  |     |--- [<type-id>]
211  |          |--- create
212  |          |--- name
213  |          |--- available_instances
214  |          |--- device_api
215  |          |--- description
216  |          |--- [devices]
217
218* [mdev_supported_types]
219
220  The list of currently supported mediated device types and their details.
221
222  [<type-id>], device_api, and available_instances are mandatory attributes
223  that should be provided by vendor driver.
224
225* [<type-id>]
226
227  The [<type-id>] name is created by adding the device driver string as a prefix
228  to the string provided by the vendor driver. This format of this name is as
229  follows::
230
231	sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
232
233  (or using mdev_parent_dev(mdev) to arrive at the parent device outside
234  of the core mdev code)
235
236* device_api
237
238  This attribute should show which device API is being created, for example,
239  "vfio-pci" for a PCI device.
240
241* available_instances
242
243  This attribute should show the number of devices of type <type-id> that can be
244  created.
245
246* [device]
247
248  This directory contains links to the devices of type <type-id> that have been
249  created.
250
251* name
252
253  This attribute should show human readable name. This is optional attribute.
254
255* description
256
257  This attribute should show brief features/description of the type. This is
258  optional attribute.
259
260Directories and Files Under the sysfs for Each mdev Device
261----------------------------------------------------------
262
263::
264
265  |- [parent phy device]
266  |--- [$MDEV_UUID]
267         |--- remove
268         |--- mdev_type {link to its type}
269         |--- vendor-specific-attributes [optional]
270
271* remove (write only)
272
273Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
274fail the remove() callback if that device is active and the vendor driver
275doesn't support hot unplug.
276
277Example::
278
279	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
280
281Mediated device Hot plug
282------------------------
283
284Mediated devices can be created and assigned at runtime. The procedure to hot
285plug a mediated device is the same as the procedure to hot plug a PCI device.
286
287Translation APIs for Mediated Devices
288=====================================
289
290The following APIs are provided for translating user pfn to host pfn in a VFIO
291driver::
292
293	extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
294				  int npage, int prot, unsigned long *phys_pfn);
295
296	extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
297				    int npage);
298
299These functions call back into the back-end IOMMU module by using the pin_pages
300and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
301these callbacks are supported in the TYPE1 IOMMU module. To enable them for
302other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
303these two callback functions.
304
305Using the Sample Code
306=====================
307
308mtty.c in samples/vfio-mdev/ directory is a sample driver program to
309demonstrate how to use the mediated device framework.
310
311The sample driver creates an mdev device that simulates a serial port over a PCI
312card.
313
3141. Build and load the mtty.ko module.
315
316   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
317
318   Files in this device directory in sysfs are similar to the following::
319
320     # tree /sys/devices/virtual/mtty/mtty/
321        /sys/devices/virtual/mtty/mtty/
322        |-- mdev_supported_types
323        |   |-- mtty-1
324        |   |   |-- available_instances
325        |   |   |-- create
326        |   |   |-- device_api
327        |   |   |-- devices
328        |   |   `-- name
329        |   `-- mtty-2
330        |       |-- available_instances
331        |       |-- create
332        |       |-- device_api
333        |       |-- devices
334        |       `-- name
335        |-- mtty_dev
336        |   `-- sample_mtty_dev
337        |-- power
338        |   |-- autosuspend_delay_ms
339        |   |-- control
340        |   |-- runtime_active_time
341        |   |-- runtime_status
342        |   `-- runtime_suspended_time
343        |-- subsystem -> ../../../../class/mtty
344        `-- uevent
345
3462. Create a mediated device by using the dummy device that you created in the
347   previous step::
348
349     # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >	\
350              /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
351
3523. Add parameters to qemu-kvm::
353
354     -device vfio-pci,\
355      sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
356
3574. Boot the VM.
358
359   In the Linux guest VM, with no hardware on the host, the device appears
360   as  follows::
361
362     # lspci -s 00:05.0 -xxvv
363     00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
364             Subsystem: Device 4348:3253
365             Physical Slot: 5
366             Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
367     Stepping- SERR- FastB2B- DisINTx-
368             Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
369     <TAbort- <MAbort- >SERR- <PERR- INTx-
370             Interrupt: pin A routed to IRQ 10
371             Region 0: I/O ports at c150 [size=8]
372             Region 1: I/O ports at c158 [size=8]
373             Kernel driver in use: serial
374     00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
375     10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
376     20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
377     30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
378
379     In the Linux guest VM, dmesg output for the device is as follows:
380
381     serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
382     0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
383     0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
384
385
3865. In the Linux guest VM, check the serial ports::
387
388     # setserial -g /dev/ttyS*
389     /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
390     /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
391     /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
392
3936. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
394   /dev/ttyS2 with hardware flow control disabled.
395
3967. Type data on the minicom terminal or send data to the terminal emulation
397   program and read the data.
398
399   Data is loop backed from hosts mtty driver.
400
4018. Destroy the mediated device that you created::
402
403     # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
404
405References
406==========
407
4081. See Documentation/driver-api/vfio.rst for more information on VFIO.
4092. struct mdev_driver in include/linux/mdev.h
4103. struct mdev_parent_ops in include/linux/mdev.h
4114. struct vfio_iommu_driver_ops in include/linux/vfio.h
412