1.. include:: <isonum.txt>
2
3=====================
4VFIO Mediated devices
5=====================
6
7:Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved.
8:Author: Neo Jia <cjia@nvidia.com>
9:Author: Kirti Wankhede <kwankhede@nvidia.com>
10
11This program is free software; you can redistribute it and/or modify
12it under the terms of the GNU General Public License version 2 as
13published by the Free Software Foundation.
14
15
16Virtual Function I/O (VFIO) Mediated devices[1]
17===============================================
18
19The number of use cases for virtualizing DMA devices that do not have built-in
20SR_IOV capability is increasing. Previously, to virtualize such devices,
21developers had to create their own management interfaces and APIs, and then
22integrate them with user space software. To simplify integration with user space
23software, we have identified common requirements and a unified management
24interface for such devices.
25
26The VFIO driver framework provides unified APIs for direct device access. It is
27an IOMMU/device-agnostic framework for exposing direct device access to user
28space in a secure, IOMMU-protected environment. This framework is used for
29multiple devices, such as GPUs, network adapters, and compute accelerators. With
30direct device access, virtual machines or user space applications have direct
31access to the physical device. This framework is reused for mediated devices.
32
33The mediated core driver provides a common interface for mediated device
34management that can be used by drivers of different devices. This module
35provides a generic interface to perform these operations:
36
37* Create and destroy a mediated device
38* Add a mediated device to and remove it from a mediated bus driver
39* Add a mediated device to and remove it from an IOMMU group
40
41The mediated core driver also provides an interface to register a bus driver.
42For example, the mediated VFIO mdev driver is designed for mediated devices and
43supports VFIO APIs. The mediated bus driver adds a mediated device to and
44removes it from a VFIO group.
45
46The following high-level block diagram shows the main components and interfaces
47in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM
48devices as examples, as these devices are the first devices to use this module::
49
50     +---------------+
51     |               |
52     | +-----------+ |  mdev_register_driver() +--------------+
53     | |           | +<------------------------+              |
54     | |  mdev     | |                         |              |
55     | |  bus      | +------------------------>+ vfio_mdev.ko |<-> VFIO user
56     | |  driver   | |     probe()/remove()    |              |    APIs
57     | |           | |                         +--------------+
58     | +-----------+ |
59     |               |
60     |  MDEV CORE    |
61     |   MODULE      |
62     |   mdev.ko     |
63     | +-----------+ |  mdev_register_device() +--------------+
64     | |           | +<------------------------+              |
65     | |           | |                         |  nvidia.ko   |<-> physical
66     | |           | +------------------------>+              |    device
67     | |           | |        callbacks        +--------------+
68     | | Physical  | |
69     | |  device   | |  mdev_register_device() +--------------+
70     | | interface | |<------------------------+              |
71     | |           | |                         |  i915.ko     |<-> physical
72     | |           | +------------------------>+              |    device
73     | |           | |        callbacks        +--------------+
74     | |           | |
75     | |           | |  mdev_register_device() +--------------+
76     | |           | +<------------------------+              |
77     | |           | |                         | ccw_device.ko|<-> physical
78     | |           | +------------------------>+              |    device
79     | |           | |        callbacks        +--------------+
80     | +-----------+ |
81     +---------------+
82
83
84Registration Interfaces
85=======================
86
87The mediated core driver provides the following types of registration
88interfaces:
89
90* Registration interface for a mediated bus driver
91* Physical device driver interface
92
93Registration Interface for a Mediated Bus Driver
94------------------------------------------------
95
96The registration interface for a mediated device driver provides the following
97structure to represent a mediated device's driver::
98
99     /*
100      * struct mdev_driver [2] - Mediated device's driver
101      * @probe: called when new device created
102      * @remove: called when device removed
103      * @driver: device driver structure
104      */
105     struct mdev_driver {
106	     int  (*probe)  (struct mdev_device *dev);
107	     void (*remove) (struct mdev_device *dev);
108	     struct device_driver    driver;
109     };
110
111A mediated bus driver for mdev should use this structure in the function calls
112to register and unregister itself with the core driver:
113
114* Register::
115
116    extern int  mdev_register_driver(struct mdev_driver *drv);
117
118* Unregister::
119
120    extern void mdev_unregister_driver(struct mdev_driver *drv);
121
122The mediated bus driver is responsible for adding mediated devices to the VFIO
123group when devices are bound to the driver and removing mediated devices from
124the VFIO when devices are unbound from the driver.
125
126
127Physical Device Driver Interface
128--------------------------------
129
130The physical device driver interface provides the mdev_parent_ops[3] structure
131to define the APIs to manage work in the mediated core driver that is related
132to the physical device.
133
134The structures in the mdev_parent_ops structure are as follows:
135
136* dev_attr_groups: attributes of the parent device
137* mdev_attr_groups: attributes of the mediated device
138* supported_config: attributes to define supported configurations
139* device_driver: device driver to bind for mediated device instances
140
141The mdev_parent_ops also still has various functions pointers.  Theses exist
142for historical reasons only and shall not be used for new drivers.
143
144When a driver wants to add the GUID creation sysfs to an existing device it has
145probe'd to then it should call::
146
147	extern int  mdev_register_device(struct device *dev,
148	                                 const struct mdev_parent_ops *ops);
149
150This will provide the 'mdev_supported_types/XX/create' files which can then be
151used to trigger the creation of a mdev_device. The created mdev_device will be
152attached to the specified driver.
153
154When the driver needs to remove itself it calls::
155
156	extern void mdev_unregister_device(struct device *dev);
157
158Which will unbind and destroy all the created mdevs and remove the sysfs files.
159
160Mediated Device Management Interface Through sysfs
161==================================================
162
163The management interface through sysfs enables user space software, such as
164libvirt, to query and configure mediated devices in a hardware-agnostic fashion.
165This management interface provides flexibility to the underlying physical
166device's driver to support features such as:
167
168* Mediated device hot plug
169* Multiple mediated devices in a single virtual machine
170* Multiple mediated devices from different physical devices
171
172Links in the mdev_bus Class Directory
173-------------------------------------
174The /sys/class/mdev_bus/ directory contains links to devices that are registered
175with the mdev core driver.
176
177Directories and files under the sysfs for Each Physical Device
178--------------------------------------------------------------
179
180::
181
182  |- [parent physical device]
183  |--- Vendor-specific-attributes [optional]
184  |--- [mdev_supported_types]
185  |     |--- [<type-id>]
186  |     |   |--- create
187  |     |   |--- name
188  |     |   |--- available_instances
189  |     |   |--- device_api
190  |     |   |--- description
191  |     |   |--- [devices]
192  |     |--- [<type-id>]
193  |     |   |--- create
194  |     |   |--- name
195  |     |   |--- available_instances
196  |     |   |--- device_api
197  |     |   |--- description
198  |     |   |--- [devices]
199  |     |--- [<type-id>]
200  |          |--- create
201  |          |--- name
202  |          |--- available_instances
203  |          |--- device_api
204  |          |--- description
205  |          |--- [devices]
206
207* [mdev_supported_types]
208
209  The list of currently supported mediated device types and their details.
210
211  [<type-id>], device_api, and available_instances are mandatory attributes
212  that should be provided by vendor driver.
213
214* [<type-id>]
215
216  The [<type-id>] name is created by adding the device driver string as a prefix
217  to the string provided by the vendor driver. This format of this name is as
218  follows::
219
220	sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name);
221
222  (or using mdev_parent_dev(mdev) to arrive at the parent device outside
223  of the core mdev code)
224
225* device_api
226
227  This attribute should show which device API is being created, for example,
228  "vfio-pci" for a PCI device.
229
230* available_instances
231
232  This attribute should show the number of devices of type <type-id> that can be
233  created.
234
235* [device]
236
237  This directory contains links to the devices of type <type-id> that have been
238  created.
239
240* name
241
242  This attribute should show human readable name. This is optional attribute.
243
244* description
245
246  This attribute should show brief features/description of the type. This is
247  optional attribute.
248
249Directories and Files Under the sysfs for Each mdev Device
250----------------------------------------------------------
251
252::
253
254  |- [parent phy device]
255  |--- [$MDEV_UUID]
256         |--- remove
257         |--- mdev_type {link to its type}
258         |--- vendor-specific-attributes [optional]
259
260* remove (write only)
261
262Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can
263fail the remove() callback if that device is active and the vendor driver
264doesn't support hot unplug.
265
266Example::
267
268	# echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove
269
270Mediated device Hot plug
271------------------------
272
273Mediated devices can be created and assigned at runtime. The procedure to hot
274plug a mediated device is the same as the procedure to hot plug a PCI device.
275
276Translation APIs for Mediated Devices
277=====================================
278
279The following APIs are provided for translating user pfn to host pfn in a VFIO
280driver::
281
282	extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
283				  int npage, int prot, unsigned long *phys_pfn);
284
285	extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
286				    int npage);
287
288These functions call back into the back-end IOMMU module by using the pin_pages
289and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
290these callbacks are supported in the TYPE1 IOMMU module. To enable them for
291other IOMMU backend modules, such as PPC64 sPAPR module, they need to provide
292these two callback functions.
293
294Using the Sample Code
295=====================
296
297mtty.c in samples/vfio-mdev/ directory is a sample driver program to
298demonstrate how to use the mediated device framework.
299
300The sample driver creates an mdev device that simulates a serial port over a PCI
301card.
302
3031. Build and load the mtty.ko module.
304
305   This step creates a dummy device, /sys/devices/virtual/mtty/mtty/
306
307   Files in this device directory in sysfs are similar to the following::
308
309     # tree /sys/devices/virtual/mtty/mtty/
310        /sys/devices/virtual/mtty/mtty/
311        |-- mdev_supported_types
312        |   |-- mtty-1
313        |   |   |-- available_instances
314        |   |   |-- create
315        |   |   |-- device_api
316        |   |   |-- devices
317        |   |   `-- name
318        |   `-- mtty-2
319        |       |-- available_instances
320        |       |-- create
321        |       |-- device_api
322        |       |-- devices
323        |       `-- name
324        |-- mtty_dev
325        |   `-- sample_mtty_dev
326        |-- power
327        |   |-- autosuspend_delay_ms
328        |   |-- control
329        |   |-- runtime_active_time
330        |   |-- runtime_status
331        |   `-- runtime_suspended_time
332        |-- subsystem -> ../../../../class/mtty
333        `-- uevent
334
3352. Create a mediated device by using the dummy device that you created in the
336   previous step::
337
338     # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" >	\
339              /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create
340
3413. Add parameters to qemu-kvm::
342
343     -device vfio-pci,\
344      sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001
345
3464. Boot the VM.
347
348   In the Linux guest VM, with no hardware on the host, the device appears
349   as  follows::
350
351     # lspci -s 00:05.0 -xxvv
352     00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550])
353             Subsystem: Device 4348:3253
354             Physical Slot: 5
355             Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
356     Stepping- SERR- FastB2B- DisINTx-
357             Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
358     <TAbort- <MAbort- >SERR- <PERR- INTx-
359             Interrupt: pin A routed to IRQ 10
360             Region 0: I/O ports at c150 [size=8]
361             Region 1: I/O ports at c158 [size=8]
362             Kernel driver in use: serial
363     00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00
364     10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00
365     20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32
366     30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00
367
368     In the Linux guest VM, dmesg output for the device is as follows:
369
370     serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
371     0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A
372     0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A
373
374
3755. In the Linux guest VM, check the serial ports::
376
377     # setserial -g /dev/ttyS*
378     /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
379     /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10
380     /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10
381
3826. Using minicom or any terminal emulation program, open port /dev/ttyS1 or
383   /dev/ttyS2 with hardware flow control disabled.
384
3857. Type data on the minicom terminal or send data to the terminal emulation
386   program and read the data.
387
388   Data is loop backed from hosts mtty driver.
389
3908. Destroy the mediated device that you created::
391
392     # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove
393
394References
395==========
396
3971. See Documentation/driver-api/vfio.rst for more information on VFIO.
3982. struct mdev_driver in include/linux/mdev.h
3993. struct mdev_parent_ops in include/linux/mdev.h
4004. struct vfio_iommu_driver_ops in include/linux/vfio.h
401