1QEMU Virtual NVDIMM 2=================== 3 4This document explains the usage of virtual NVDIMM (vNVDIMM) feature 5which is available since QEMU v2.6.0. 6 7The current QEMU only implements the persistent memory mode of vNVDIMM 8device and not the block window mode. 9 10Basic Usage 11----------- 12 13The storage of a vNVDIMM device in QEMU is provided by the memory 14backend (i.e. memory-backend-file and memory-backend-ram). A simple 15way to create a vNVDIMM device at startup time is done via the 16following command line options: 17 18 -machine pc,nvdimm 19 -m $RAM_SIZE,slots=$N,maxmem=$MAX_SIZE 20 -object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE 21 -device nvdimm,id=nvdimm1,memdev=mem1 22 23Where, 24 25 - the "nvdimm" machine option enables vNVDIMM feature. 26 27 - "slots=$N" should be equal to or larger than the total amount of 28 normal RAM devices and vNVDIMM devices, e.g. $N should be >= 2 here. 29 30 - "maxmem=$MAX_SIZE" should be equal to or larger than the total size 31 of normal RAM devices and vNVDIMM devices, e.g. $MAX_SIZE should be 32 >= $RAM_SIZE + $NVDIMM_SIZE here. 33 34 - "object memory-backend-file,id=mem1,share=on,mem-path=$PATH,size=$NVDIMM_SIZE" 35 creates a backend storage of size $NVDIMM_SIZE on a file $PATH. All 36 accesses to the virtual NVDIMM device go to the file $PATH. 37 38 "share=on/off" controls the visibility of guest writes. If 39 "share=on", then guest writes will be applied to the backend 40 file. If another guest uses the same backend file with option 41 "share=on", then above writes will be visible to it as well. If 42 "share=off", then guest writes won't be applied to the backend 43 file and thus will be invisible to other guests. 44 45 - "device nvdimm,id=nvdimm1,memdev=mem1" creates a virtual NVDIMM 46 device whose storage is provided by above memory backend device. 47 48Multiple vNVDIMM devices can be created if multiple pairs of "-object" 49and "-device" are provided. 50 51For above command line options, if the guest OS has the proper NVDIMM 52driver, it should be able to detect a NVDIMM device which is in the 53persistent memory mode and whose size is $NVDIMM_SIZE. 54 55Note: 56 571. Prior to QEMU v2.8.0, if memory-backend-file is used and the actual 58 backend file size is not equal to the size given by "size" option, 59 QEMU will truncate the backend file by ftruncate(2), which will 60 corrupt the existing data in the backend file, especially for the 61 shrink case. 62 63 QEMU v2.8.0 and later check the backend file size and the "size" 64 option. If they do not match, QEMU will report errors and abort in 65 order to avoid the data corruption. 66 672. QEMU v2.6.0 only puts a basic alignment requirement on the "size" 68 option of memory-backend-file, e.g. 4KB alignment on x86. However, 69 QEMU v.2.7.0 puts an additional alignment requirement, which may 70 require a larger value than the basic one, e.g. 2MB on x86. This 71 change breaks the usage of memory-backend-file that only satisfies 72 the basic alignment. 73 74 QEMU v2.8.0 and later remove the additional alignment on non-s390x 75 architectures, so the broken memory-backend-file can work again. 76 77Label 78----- 79 80QEMU v2.7.0 and later implement the label support for vNVDIMM devices. 81To enable label on vNVDIMM devices, users can simply add 82"label-size=$SZ" option to "-device nvdimm", e.g. 83 84 -device nvdimm,id=nvdimm1,memdev=mem1,label-size=128K 85 86Note: 87 881. The minimal label size is 128KB. 89 902. QEMU v2.7.0 and later store labels at the end of backend storage. 91 If a memory backend file, which was previously used as the backend 92 of a vNVDIMM device without labels, is now used for a vNVDIMM 93 device with label, the data in the label area at the end of file 94 will be inaccessible to the guest. If any useful data (e.g. the 95 meta-data of the file system) was stored there, the latter usage 96 may result guest data corruption (e.g. breakage of guest file 97 system). 98 99Hotplug 100------- 101 102QEMU v2.8.0 and later implement the hotplug support for vNVDIMM 103devices. Similarly to the RAM hotplug, the vNVDIMM hotplug is 104accomplished by two monitor commands "object_add" and "device_add". 105 106For example, the following commands add another 4GB vNVDIMM device to 107the guest: 108 109 (qemu) object_add memory-backend-file,id=mem2,share=on,mem-path=new_nvdimm.img,size=4G 110 (qemu) device_add nvdimm,id=nvdimm2,memdev=mem2 111 112Note: 113 1141. Each hotplugged vNVDIMM device consumes one memory slot. Users 115 should always ensure the memory option "-m ...,slots=N" specifies 116 enough number of slots, i.e. 117 N >= number of RAM devices + 118 number of statically plugged vNVDIMM devices + 119 number of hotplugged vNVDIMM devices 120 1212. The similar is required for the memory option "-m ...,maxmem=M", i.e. 122 M >= size of RAM devices + 123 size of statically plugged vNVDIMM devices + 124 size of hotplugged vNVDIMM devices 125 126Alignment 127--------- 128 129QEMU uses mmap(2) to maps vNVDIMM backends and aligns the mapping 130address to the page size (getpagesize(2)) by default. However, some 131types of backends may require an alignment different than the page 132size. In that case, QEMU v2.12.0 and later provide 'align' option to 133memory-backend-file to allow users to specify the proper alignment. 134 135For example, device dax require the 2 MB alignment, so we can use 136following QEMU command line options to use it (/dev/dax0.0) as the 137backend of vNVDIMM: 138 139 -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M 140 -device nvdimm,id=nvdimm1,memdev=mem1 141 142Guest Data Persistence 143---------------------- 144 145Though QEMU supports multiple types of vNVDIMM backends on Linux, 146currently the only one that can guarantee the guest write persistence 147is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to 148which all guest access do not involve any host-side kernel cache. 149 150When using other types of backends, it's suggested to set 'unarmed' 151option of '-device nvdimm' to 'on', which sets the unarmed flag of the 152guest NVDIMM region mapping structure. This unarmed flag indicates 153guest software that this vNVDIMM device contains a region that cannot 154accept persistent writes. In result, for example, the guest Linux 155NVDIMM driver, marks such vNVDIMM device as read-only. 156