1=============== 2QPL Compression 3=============== 4The Intel Query Processing Library (Intel ``QPL``) is an open-source library to 5provide compression and decompression features and it is based on deflate 6compression algorithm (RFC 1951). 7 8The ``QPL`` compression relies on Intel In-Memory Analytics Accelerator(``IAA``) 9and Shared Virtual Memory(``SVM``) technology, they are new features supported 10from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids 11processor(``SPR``). 12 13For more ``QPL`` introduction, please refer to `QPL Introduction 14<https://intel.github.io/qpl/documentation/introduction_docs/introduction.html>`_ 15 16QPL Compression Framework 17========================= 18 19:: 20 21 +----------------+ +------------------+ 22 | MultiFD Thread | |accel-config tool | 23 +-------+--------+ +--------+---------+ 24 | | 25 | | 26 |compress/decompress | 27 +-------+--------+ | Setup IAA 28 | QPL library | | Resources 29 +-------+---+----+ | 30 | | | 31 | +-------------+-------+ 32 | Open IAA | 33 | Devices +-----+-----+ 34 | |idxd driver| 35 | +-----+-----+ 36 | | 37 | | 38 | +-----+-----+ 39 +-----------+IAA Devices| 40 Submit jobs +-----------+ 41 via enqcmd 42 43 44QPL Build And Installation 45-------------------------- 46 47.. code-block:: shell 48 49 $git clone --recursive https://github.com/intel/qpl.git qpl 50 $mkdir qpl/build 51 $cd qpl/build 52 $cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DQPL_LIBRARY_TYPE=SHARED .. 53 $sudo cmake --build . --target install 54 55For more details about ``QPL`` installation, please refer to `QPL Installation 56<https://intel.github.io/qpl/documentation/get_started_docs/installation.html>`_ 57 58IAA Device Management 59--------------------- 60 61The number of ``IAA`` devices will vary depending on the Xeon product model. 62On a ``SPR`` server, there can be a maximum of 8 ``IAA`` devices, with up to 634 devices per socket. 64 65By default, all ``IAA`` devices are disabled and need to be configured and 66enabled by users manually. 67 68Check the number of devices through the following command 69 70.. code-block:: shell 71 72 #lspci -d 8086:0cfe 73 6a:02.0 System peripheral: Intel Corporation Device 0cfe 74 6f:02.0 System peripheral: Intel Corporation Device 0cfe 75 74:02.0 System peripheral: Intel Corporation Device 0cfe 76 79:02.0 System peripheral: Intel Corporation Device 0cfe 77 e7:02.0 System peripheral: Intel Corporation Device 0cfe 78 ec:02.0 System peripheral: Intel Corporation Device 0cfe 79 f1:02.0 System peripheral: Intel Corporation Device 0cfe 80 f6:02.0 System peripheral: Intel Corporation Device 0cfe 81 82IAA Device Configuration And Enabling 83^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 84 85The ``accel-config`` tool is used to enable ``IAA`` devices and configure 86``IAA`` hardware resources(work queues and engines). One ``IAA`` device 87has 8 work queues and 8 processing engines, multiple engines can be assigned 88to a work queue via ``group`` attribute. 89 90For ``accel-config`` installation, please refer to `accel-config installation 91<https://github.com/intel/idxd-config>`_ 92 93One example of configuring and enabling an ``IAA`` device. 94 95.. code-block:: shell 96 97 #accel-config config-engine iax1/engine1.0 -g 0 98 #accel-config config-engine iax1/engine1.1 -g 0 99 #accel-config config-engine iax1/engine1.2 -g 0 100 #accel-config config-engine iax1/engine1.3 -g 0 101 #accel-config config-engine iax1/engine1.4 -g 0 102 #accel-config config-engine iax1/engine1.5 -g 0 103 #accel-config config-engine iax1/engine1.6 -g 0 104 #accel-config config-engine iax1/engine1.7 -g 0 105 #accel-config config-wq iax1/wq1.0 -g 0 -s 128 -p 10 -b 1 -t 128 -m shared -y user -n app1 -d user 106 #accel-config enable-device iax1 107 #accel-config enable-wq iax1/wq1.0 108 109.. note:: 110 IAX is an early name for IAA 111 112- The ``IAA`` device index is 1, use ``ls -lh /sys/bus/dsa/devices/iax*`` 113 command to query the ``IAA`` device index. 114 115- 8 engines and 1 work queue are configured in group 0, so all compression jobs 116 submitted to this work queue can be processed by all engines at the same time. 117 118- Set work queue attributes including the work mode, work queue size and so on. 119 120- Enable the ``IAA1`` device and work queue 1.0 121 122.. note:: 123 124 Set work queue mode to shared mode, since ``QPL`` library only supports 125 shared mode 126 127For more detailed configuration, please refer to `IAA Configuration Samples 128<https://github.com/intel/idxd-config/tree/stable/Documentation/accfg>`_ 129 130IAA Unit Test 131^^^^^^^^^^^^^ 132 133- Enabling ``IAA`` devices for Xeon platform, please refer to `IAA User Guide 134 <https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html>`_ 135 136- ``IAA`` device driver is Intel Data Accelerator Driver (idxd), it is 137 recommended that the minimum version of Linux kernel is 5.18. 138 139- Add ``"intel_iommu=on,sm_on"`` parameter to kernel command line 140 for ``SVM`` feature enabling. 141 142Here is an easy way to verify ``IAA`` device driver and ``SVM`` with `iaa_test 143<https://github.com/intel/idxd-config/tree/stable/test>`_ 144 145.. code-block:: shell 146 147 #./test/iaa_test 148 [ info] alloc wq 0 shared size 128 addr 0x7f26cebe5000 batch sz 0xfffffffe xfer sz 0x80000000 149 [ info] test noop: tflags 0x1 num_desc 1 150 [ info] preparing descriptor for noop 151 [ info] Submitted all noop jobs 152 [ info] verifying task result for 0x16f7e20 153 [ info] test with op 0 passed 154 155 156IAA Resources Allocation For Migration 157^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 158 159There is no ``IAA`` resource configuration parameters for migration and 160``accel-config`` tool configuration cannot directly specify the ``IAA`` 161resources used for migration. 162 163The multifd migration with ``QPL`` compression method will use all work 164queues that are enabled and shared mode. 165 166.. note:: 167 168 Accessing IAA resources requires ``sudo`` command or ``root`` privileges 169 by default. Administrators can modify the IAA device node ownership 170 so that QEMU can use IAA with specified user permissions. 171 172 For example 173 174 #chown -R qemu /dev/iax 175 176Shared Virtual Memory(SVM) Introduction 177======================================= 178 179An ability for an accelerator I/O device to operate in the same virtual 180memory space of applications on host processors. It also implies the 181ability to operate from pageable memory, avoiding functional requirements 182to pin memory for DMA operations. 183 184When using ``SVM`` technology, users do not need to reserve memory for the 185``IAA`` device and perform pin memory operation. The ``IAA`` device can 186directly access data using the virtual address of the process. 187 188For more ``SVM`` technology, please refer to 189`Shared Virtual Addressing (SVA) with ENQCMD 190<https://docs.kernel.org/next/x86/sva.html>`_ 191 192 193How To Use QPL Compression In Migration 194======================================= 195 1961 - Installation of ``QPL`` library and ``accel-config`` library if using IAA 197 1982 - Configure and enable ``IAA`` devices and work queues via ``accel-config`` 199 2003 - Build ``QEMU`` with ``--enable-qpl`` parameter 201 202 E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qpl`` 203 2044 - Enable ``QPL`` compression during migration 205 206 Set ``migrate_set_parameter multifd-compression qpl`` when migrating, the 207 ``QPL`` compression does not support configuring the compression level, it 208 only supports one compression level. 209 210The Difference Between QPL And ZLIB 211=================================== 212 213Although both ``QPL`` and ``ZLIB`` are based on the deflate compression 214algorithm, and ``QPL`` can support the header and tail of ``ZLIB``, ``QPL`` 215is still not fully compatible with the ``ZLIB`` compression in the migration. 216 217``QPL`` only supports 4K history buffer, and ``ZLIB`` is 32K by default. 218``ZLIB`` compresses data that ``QPL`` may not decompress correctly and 219vice versa. 220 221``QPL`` does not support the ``Z_SYNC_FLUSH`` operation in ``ZLIB`` streaming 222compression, current ``ZLIB`` implementation uses ``Z_SYNC_FLUSH``, so each 223``multifd`` thread has a ``ZLIB`` streaming context, and all page compression 224and decompression are based on this stream. ``QPL`` cannot decompress such data 225and vice versa. 226 227The introduction for ``Z_SYNC_FLUSH``, please refer to `Zlib Manual 228<https://www.zlib.net/manual.html>`_ 229 230The Best Practices 231================== 232When user enables the IAA device for ``QPL`` compression, it is recommended 233to add ``-mem-prealloc`` parameter to the destination boot parameters. This 234parameter can avoid the occurrence of I/O page fault and reduce the overhead 235of IAA compression and decompression. 236 237The example of booting with ``-mem-prealloc`` parameter 238 239.. code-block:: shell 240 241 $qemu-system-x86_64 --enable-kvm -cpu host --mem-prealloc ... 242 243 244An example about I/O page fault measurement of destination without 245``-mem-prealloc``, the ``svm_prq`` indicates the number of I/O page fault 246occurrences and processing time. 247 248.. code-block:: shell 249 250 #echo 1 > /sys/kernel/debug/iommu/intel/dmar_perf_latency 251 #echo 2 > /sys/kernel/debug/iommu/intel/dmar_perf_latency 252 #echo 3 > /sys/kernel/debug/iommu/intel/dmar_perf_latency 253 #echo 4 > /sys/kernel/debug/iommu/intel/dmar_perf_latency 254 #cat /sys/kernel/debug/iommu/intel/dmar_perf_latency 255 IOMMU: dmar18 Register Base Address: c87fc000 256 <0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms 1ms-10ms >=10ms min(us) max(us) average(us) 257 inv_iotlb 0 286 123 0 0 0 0 0 1 0 258 inv_devtlb 0 276 133 0 0 0 0 0 2 0 259 inv_iec 0 0 0 0 0 0 0 0 0 0 260 svm_prq 0 0 25206 364 395 0 0 1 556 9 261