1======================================== 2zram: Compressed RAM based block devices 3======================================== 4 5Introduction 6============ 7 8The zram module creates RAM based block devices named /dev/zram<id> 9(<id> = 0, 1, ...). Pages written to these disks are compressed and stored 10in memory itself. These disks allow very fast I/O and compression provides 11good amounts of memory savings. Some of the usecases include /tmp storage, 12use as swap disks, various caches under /var and maybe many more :) 13 14Statistics for individual zram devices are exported through sysfs nodes at 15/sys/block/zram<id>/ 16 17Usage 18===== 19 20There are several ways to configure and manage zram device(-s): 21 22a) using zram and zram_control sysfs attributes 23b) using zramctl utility, provided by util-linux (util-linux@vger.kernel.org). 24 25In this document we will describe only 'manual' zram configuration steps, 26IOW, zram and zram_control sysfs attributes. 27 28In order to get a better idea about zramctl please consult util-linux 29documentation, zramctl man-page or `zramctl --help`. Please be informed 30that zram maintainers do not develop/maintain util-linux or zramctl, should 31you have any questions please contact util-linux@vger.kernel.org 32 33Following shows a typical sequence of steps for using zram. 34 35WARNING 36======= 37 38For the sake of simplicity we skip error checking parts in most of the 39examples below. However, it is your sole responsibility to handle errors. 40 41zram sysfs attributes always return negative values in case of errors. 42The list of possible return codes: 43 44======== ============================================================= 45-EBUSY an attempt to modify an attribute that cannot be changed once 46 the device has been initialised. Please reset device first; 47-ENOMEM zram was not able to allocate enough memory to fulfil your 48 needs; 49-EINVAL invalid input has been provided. 50======== ============================================================= 51 52If you use 'echo', the returned value that is changed by 'echo' utility, 53and, in general case, something like:: 54 55 echo 3 > /sys/block/zram0/max_comp_streams 56 if [ $? -ne 0 ]; 57 handle_error 58 fi 59 60should suffice. 61 621) Load Module 63============== 64 65:: 66 67 modprobe zram num_devices=4 68 This creates 4 devices: /dev/zram{0,1,2,3} 69 70num_devices parameter is optional and tells zram how many devices should be 71pre-created. Default: 1. 72 732) Set max number of compression streams 74======================================== 75 76Regardless the value passed to this attribute, ZRAM will always 77allocate multiple compression streams - one per online CPUs - thus 78allowing several concurrent compression operations. The number of 79allocated compression streams goes down when some of the CPUs 80become offline. There is no single-compression-stream mode anymore, 81unless you are running a UP system or has only 1 CPU online. 82 83To find out how many streams are currently available:: 84 85 cat /sys/block/zram0/max_comp_streams 86 873) Select compression algorithm 88=============================== 89 90Using comp_algorithm device attribute one can see available and 91currently selected (shown in square brackets) compression algorithms, 92change selected compression algorithm (once the device is initialised 93there is no way to change compression algorithm). 94 95Examples:: 96 97 #show supported compression algorithms 98 cat /sys/block/zram0/comp_algorithm 99 lzo [lz4] 100 101 #select lzo compression algorithm 102 echo lzo > /sys/block/zram0/comp_algorithm 103 104For the time being, the `comp_algorithm` content does not necessarily 105show every compression algorithm supported by the kernel. We keep this 106list primarily to simplify device configuration and one can configure 107a new device with a compression algorithm that is not listed in 108`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API 109and, if some of the algorithms were built as modules, it's impossible 110to list all of them using, for instance, /proc/crypto or any other 111method. This, however, has an advantage of permitting the usage of 112custom crypto compression modules (implementing S/W or H/W compression). 113 1144) Set Disksize 115=============== 116 117Set disk size by writing the value to sysfs node 'disksize'. 118The value can be either in bytes or you can use mem suffixes. 119Examples:: 120 121 # Initialize /dev/zram0 with 50MB disksize 122 echo $((50*1024*1024)) > /sys/block/zram0/disksize 123 124 # Using mem suffixes 125 echo 256K > /sys/block/zram0/disksize 126 echo 512M > /sys/block/zram0/disksize 127 echo 1G > /sys/block/zram0/disksize 128 129Note: 130There is little point creating a zram of greater than twice the size of memory 131since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the 132size of the disk when not in use so a huge zram is wasteful. 133 1345) Set memory limit: Optional 135============================= 136 137Set memory limit by writing the value to sysfs node 'mem_limit'. 138The value can be either in bytes or you can use mem suffixes. 139In addition, you could change the value in runtime. 140Examples:: 141 142 # limit /dev/zram0 with 50MB memory 143 echo $((50*1024*1024)) > /sys/block/zram0/mem_limit 144 145 # Using mem suffixes 146 echo 256K > /sys/block/zram0/mem_limit 147 echo 512M > /sys/block/zram0/mem_limit 148 echo 1G > /sys/block/zram0/mem_limit 149 150 # To disable memory limit 151 echo 0 > /sys/block/zram0/mem_limit 152 1536) Activate 154=========== 155 156:: 157 158 mkswap /dev/zram0 159 swapon /dev/zram0 160 161 mkfs.ext4 /dev/zram1 162 mount /dev/zram1 /tmp 163 1647) Add/remove zram devices 165========================== 166 167zram provides a control interface, which enables dynamic (on-demand) device 168addition and removal. 169 170In order to add a new /dev/zramX device, perform read operation on hot_add 171attribute. This will return either new device's device id (meaning that you 172can use /dev/zram<id>) or error code. 173 174Example:: 175 176 cat /sys/class/zram-control/hot_add 177 1 178 179To remove the existing /dev/zramX device (where X is a device id) 180execute:: 181 182 echo X > /sys/class/zram-control/hot_remove 183 1848) Stats 185======== 186 187Per-device statistics are exported as various nodes under /sys/block/zram<id>/ 188 189A brief description of exported device attributes. For more details please 190read Documentation/ABI/testing/sysfs-block-zram. 191 192====================== ====== =============================================== 193Name access description 194====================== ====== =============================================== 195disksize RW show and set the device's disk size 196initstate RO shows the initialization state of the device 197reset WO trigger device reset 198mem_used_max WO reset the `mem_used_max` counter (see later) 199mem_limit WO specifies the maximum amount of memory ZRAM can 200 use to store the compressed data 201writeback_limit WO specifies the maximum amount of write IO zram 202 can write out to backing device as 4KB unit 203writeback_limit_enable RW show and set writeback_limit feature 204max_comp_streams RW the number of possible concurrent compress 205 operations 206comp_algorithm RW show and change the compression algorithm 207compact WO trigger memory compaction 208debug_stat RO this file is used for zram debugging purposes 209backing_dev RW set up backend storage for zram to write out 210idle WO mark allocated slot as idle 211====================== ====== =============================================== 212 213 214User space is advised to use the following files to read the device statistics. 215 216File /sys/block/zram<id>/stat 217 218Represents block layer statistics. Read Documentation/block/stat.rst for 219details. 220 221File /sys/block/zram<id>/io_stat 222 223The stat file represents device's I/O statistics not accounted by block 224layer and, thus, not available in zram<id>/stat file. It consists of a 225single line of text and contains the following stats separated by 226whitespace: 227 228 ============= ============================================================= 229 failed_reads The number of failed reads 230 failed_writes The number of failed writes 231 invalid_io The number of non-page-size-aligned I/O requests 232 notify_free Depending on device usage scenario it may account 233 234 a) the number of pages freed because of swap slot free 235 notifications 236 b) the number of pages freed because of 237 REQ_OP_DISCARD requests sent by bio. The former ones are 238 sent to a swap block device when a swap slot is freed, 239 which implies that this disk is being used as a swap disk. 240 241 The latter ones are sent by filesystem mounted with 242 discard option, whenever some data blocks are getting 243 discarded. 244 ============= ============================================================= 245 246File /sys/block/zram<id>/mm_stat 247 248The stat file represents device's mm statistics. It consists of a single 249line of text and contains the following stats separated by whitespace: 250 251 ================ ============================================================= 252 orig_data_size uncompressed size of data stored in this disk. 253 This excludes same-element-filled pages (same_pages) since 254 no memory is allocated for them. 255 Unit: bytes 256 compr_data_size compressed size of data stored in this disk 257 mem_used_total the amount of memory allocated for this disk. This 258 includes allocator fragmentation and metadata overhead, 259 allocated for this disk. So, allocator space efficiency 260 can be calculated using compr_data_size and this statistic. 261 Unit: bytes 262 mem_limit the maximum amount of memory ZRAM can use to store 263 the compressed data 264 mem_used_max the maximum amount of memory zram have consumed to 265 store the data 266 same_pages the number of same element filled pages written to this disk. 267 No memory is allocated for such pages. 268 pages_compacted the number of pages freed during compaction 269 huge_pages the number of incompressible pages 270 ================ ============================================================= 271 272File /sys/block/zram<id>/bd_stat 273 274The stat file represents device's backing device statistics. It consists of 275a single line of text and contains the following stats separated by whitespace: 276 277 ============== ============================================================= 278 bd_count size of data written in backing device. 279 Unit: 4K bytes 280 bd_reads the number of reads from backing device 281 Unit: 4K bytes 282 bd_writes the number of writes to backing device 283 Unit: 4K bytes 284 ============== ============================================================= 285 2869) Deactivate 287============= 288 289:: 290 291 swapoff /dev/zram0 292 umount /dev/zram1 293 29410) Reset 295========= 296 297 Write any positive value to 'reset' sysfs node:: 298 299 echo 1 > /sys/block/zram0/reset 300 echo 1 > /sys/block/zram1/reset 301 302 This frees all the memory allocated for the given device and 303 resets the disksize to zero. You must set the disksize again 304 before reusing the device. 305 306Optional Feature 307================ 308 309writeback 310--------- 311 312With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page 313to backing storage rather than keeping it in memory. 314To use the feature, admin should set up backing device via:: 315 316 echo /dev/sda5 > /sys/block/zramX/backing_dev 317 318before disksize setting. It supports only partition at this moment. 319If admin want to use incompressible page writeback, they could do via:: 320 321 echo huge > /sys/block/zramX/write 322 323To use idle page writeback, first, user need to declare zram pages 324as idle:: 325 326 echo all > /sys/block/zramX/idle 327 328From now on, any pages on zram are idle pages. The idle mark 329will be removed until someone request access of the block. 330IOW, unless there is access request, those pages are still idle pages. 331 332Admin can request writeback of those idle pages at right timing via:: 333 334 echo idle > /sys/block/zramX/writeback 335 336With the command, zram writeback idle pages from memory to the storage. 337 338If there are lots of write IO with flash device, potentially, it has 339flash wearout problem so that admin needs to design write limitation 340to guarantee storage health for entire product life. 341 342To overcome the concern, zram supports "writeback_limit" feature. 343The "writeback_limit_enable"'s default value is 0 so that it doesn't limit 344any writeback. IOW, if admin want to apply writeback budget, he should 345enable writeback_limit_enable via:: 346 347 $ echo 1 > /sys/block/zramX/writeback_limit_enable 348 349Once writeback_limit_enable is set, zram doesn't allow any writeback 350until admin set the budget via /sys/block/zramX/writeback_limit. 351 352(If admin doesn't enable writeback_limit_enable, writeback_limit's value 353assigned via /sys/block/zramX/writeback_limit is meaninless.) 354 355If admin want to limit writeback as per-day 400M, he could do it 356like below:: 357 358 $ MB_SHIFT=20 359 $ 4K_SHIFT=12 360 $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ 361 /sys/block/zram0/writeback_limit. 362 $ echo 1 > /sys/block/zram0/writeback_limit_enable 363 364If admin want to allow further write again once the bugdet is exausted, 365he could do it like below:: 366 367 $ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ 368 /sys/block/zram0/writeback_limit 369 370If admin want to see remaining writeback budget since he set:: 371 372 $ cat /sys/block/zramX/writeback_limit 373 374If admin want to disable writeback limit, he could do:: 375 376 $ echo 0 > /sys/block/zramX/writeback_limit_enable 377 378The writeback_limit count will reset whenever you reset zram(e.g., 379system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of 380writeback happened until you reset the zram to allocate extra writeback 381budget in next setting is user's job. 382 383If admin want to measure writeback count in a certain period, he could 384know it via /sys/block/zram0/bd_stat's 3rd column. 385 386memory tracking 387=============== 388 389With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the 390zram block. It could be useful to catch cold or incompressible 391pages of the process with*pagemap. 392 393If you enable the feature, you could see block state via 394/sys/kernel/debug/zram/zram0/block_state". The output is as follows:: 395 396 300 75.033841 .wh. 397 301 63.806904 s... 398 302 63.806919 ..hi 399 400First column 401 zram's block index. 402Second column 403 access time since the system was booted 404Third column 405 state of the block: 406 407 s: 408 same page 409 w: 410 written page to backing store 411 h: 412 huge page 413 i: 414 idle page 415 416First line of above example says 300th block is accessed at 75.033841sec 417and the block's state is huge so it is written back to the backing 418storage. It's a debugging feature so anyone shouldn't rely on it to work 419properly. 420 421Nitin Gupta 422ngupta@vflare.org 423