12640c19dSMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 22640c19dSMauro Carvalho Chehab 32640c19dSMauro Carvalho Chehab====== 42640c19dSMauro Carvalho ChehabNILFS2 52640c19dSMauro Carvalho Chehab====== 62640c19dSMauro Carvalho Chehab 72640c19dSMauro Carvalho ChehabNILFS2 is a log-structured file system (LFS) supporting continuous 82640c19dSMauro Carvalho Chehabsnapshotting. In addition to versioning capability of the entire file 92640c19dSMauro Carvalho Chehabsystem, users can even restore files mistakenly overwritten or 102640c19dSMauro Carvalho Chehabdestroyed just a few seconds ago. Since NILFS2 can keep consistency 112640c19dSMauro Carvalho Chehablike conventional LFS, it achieves quick recovery after system 122640c19dSMauro Carvalho Chehabcrashes. 132640c19dSMauro Carvalho Chehab 142640c19dSMauro Carvalho ChehabNILFS2 creates a number of checkpoints every few seconds or per 152640c19dSMauro Carvalho Chehabsynchronous write basis (unless there is no change). Users can select 162640c19dSMauro Carvalho Chehabsignificant versions among continuously created checkpoints, and can 172640c19dSMauro Carvalho Chehabchange them into snapshots which will be preserved until they are 182640c19dSMauro Carvalho Chehabchanged back to checkpoints. 192640c19dSMauro Carvalho Chehab 202640c19dSMauro Carvalho ChehabThere is no limit on the number of snapshots until the volume gets 212640c19dSMauro Carvalho Chehabfull. Each snapshot is mountable as a read-only file system 222640c19dSMauro Carvalho Chehabconcurrently with its writable mount, and this feature is convenient 232640c19dSMauro Carvalho Chehabfor online backup. 242640c19dSMauro Carvalho Chehab 252640c19dSMauro Carvalho ChehabThe userland tools are included in nilfs-utils package, which is 262640c19dSMauro Carvalho Chehabavailable from the following download page. At least "mkfs.nilfs2", 272640c19dSMauro Carvalho Chehab"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called 282640c19dSMauro Carvalho Chehabcleaner or garbage collector) are required. Details on the tools are 292640c19dSMauro Carvalho Chehabdescribed in the man pages included in the package. 302640c19dSMauro Carvalho Chehab 312640c19dSMauro Carvalho Chehab:Project web page: https://nilfs.sourceforge.io/ 322640c19dSMauro Carvalho Chehab:Download page: https://nilfs.sourceforge.io/en/download.html 332640c19dSMauro Carvalho Chehab:List info: http://vger.kernel.org/vger-lists.html#linux-nilfs 342640c19dSMauro Carvalho Chehab 352640c19dSMauro Carvalho ChehabCaveats 362640c19dSMauro Carvalho Chehab======= 372640c19dSMauro Carvalho Chehab 382640c19dSMauro Carvalho ChehabFeatures which NILFS2 does not support yet: 392640c19dSMauro Carvalho Chehab 402640c19dSMauro Carvalho Chehab - atime 412640c19dSMauro Carvalho Chehab - extended attributes 422640c19dSMauro Carvalho Chehab - POSIX ACLs 432640c19dSMauro Carvalho Chehab - quotas 442640c19dSMauro Carvalho Chehab - fsck 452640c19dSMauro Carvalho Chehab - defragmentation 462640c19dSMauro Carvalho Chehab 472640c19dSMauro Carvalho ChehabMount options 482640c19dSMauro Carvalho Chehab============= 492640c19dSMauro Carvalho Chehab 502640c19dSMauro Carvalho ChehabNILFS2 supports the following mount options: 512640c19dSMauro Carvalho Chehab(*) == default 522640c19dSMauro Carvalho Chehab 532640c19dSMauro Carvalho Chehab======================= ======================================================= 542640c19dSMauro Carvalho Chehabbarrier(*) This enables/disables the use of write barriers. This 552640c19dSMauro Carvalho Chehabnobarrier requires an IO stack which can support barriers, and 562640c19dSMauro Carvalho Chehab if nilfs gets an error on a barrier write, it will 572640c19dSMauro Carvalho Chehab disable again with a warning. 582640c19dSMauro Carvalho Chehaberrors=continue Keep going on a filesystem error. 592640c19dSMauro Carvalho Chehaberrors=remount-ro(*) Remount the filesystem read-only on an error. 602640c19dSMauro Carvalho Chehaberrors=panic Panic and halt the machine if an error occurs. 612640c19dSMauro Carvalho Chehabcp=n Specify the checkpoint-number of the snapshot to be 622640c19dSMauro Carvalho Chehab mounted. Checkpoints and snapshots are listed by lscp 632640c19dSMauro Carvalho Chehab user command. Only the checkpoints marked as snapshot 642640c19dSMauro Carvalho Chehab are mountable with this option. Snapshot is read-only, 652640c19dSMauro Carvalho Chehab so a read-only mount option must be specified together. 662640c19dSMauro Carvalho Chehaborder=relaxed(*) Apply relaxed order semantics that allows modified data 672640c19dSMauro Carvalho Chehab blocks to be written to disk without making a 682640c19dSMauro Carvalho Chehab checkpoint if no metadata update is going. This mode 692640c19dSMauro Carvalho Chehab is equivalent to the ordered data mode of the ext3 702640c19dSMauro Carvalho Chehab filesystem except for the updates on data blocks still 712640c19dSMauro Carvalho Chehab conserve atomicity. This will improve synchronous 722640c19dSMauro Carvalho Chehab write performance for overwriting. 732640c19dSMauro Carvalho Chehaborder=strict Apply strict in-order semantics that preserves sequence 742640c19dSMauro Carvalho Chehab of all file operations including overwriting of data 752640c19dSMauro Carvalho Chehab blocks. That means, it is guaranteed that no 762640c19dSMauro Carvalho Chehab overtaking of events occurs in the recovered file 772640c19dSMauro Carvalho Chehab system after a crash. 782640c19dSMauro Carvalho Chehabnorecovery Disable recovery of the filesystem on mount. 792640c19dSMauro Carvalho Chehab This disables every write access on the device for 802640c19dSMauro Carvalho Chehab read-only mounts or snapshots. This option will fail 812640c19dSMauro Carvalho Chehab for r/w mounts on an unclean volume. 822640c19dSMauro Carvalho Chehabdiscard This enables/disables the use of discard/TRIM commands. 832640c19dSMauro Carvalho Chehabnodiscard(*) The discard/TRIM commands are sent to the underlying 842640c19dSMauro Carvalho Chehab block device when blocks are freed. This is useful 852640c19dSMauro Carvalho Chehab for SSD devices and sparse/thinly-provisioned LUNs. 862640c19dSMauro Carvalho Chehab======================= ======================================================= 872640c19dSMauro Carvalho Chehab 882640c19dSMauro Carvalho ChehabIoctls 892640c19dSMauro Carvalho Chehab====== 902640c19dSMauro Carvalho Chehab 912640c19dSMauro Carvalho ChehabThere is some NILFS2 specific functionality which can be accessed by applications 922640c19dSMauro Carvalho Chehabthrough the system call interfaces. The list of all NILFS2 specific ioctls are 932640c19dSMauro Carvalho Chehabshown in the table below. 942640c19dSMauro Carvalho Chehab 952640c19dSMauro Carvalho ChehabTable of NILFS2 specific ioctls: 962640c19dSMauro Carvalho Chehab 972640c19dSMauro Carvalho Chehab ============================== =============================================== 982640c19dSMauro Carvalho Chehab Ioctl Description 992640c19dSMauro Carvalho Chehab ============================== =============================================== 1002640c19dSMauro Carvalho Chehab NILFS_IOCTL_CHANGE_CPMODE Change mode of given checkpoint between 1012640c19dSMauro Carvalho Chehab checkpoint and snapshot state. This ioctl is 1022640c19dSMauro Carvalho Chehab used in chcp and mkcp utilities. 1032640c19dSMauro Carvalho Chehab 1042640c19dSMauro Carvalho Chehab NILFS_IOCTL_DELETE_CHECKPOINT Remove checkpoint from NILFS2 file system. 1052640c19dSMauro Carvalho Chehab This ioctl is used in rmcp utility. 1062640c19dSMauro Carvalho Chehab 1072640c19dSMauro Carvalho Chehab NILFS_IOCTL_GET_CPINFO Return info about requested checkpoints. This 1082640c19dSMauro Carvalho Chehab ioctl is used in lscp utility and by 1092640c19dSMauro Carvalho Chehab nilfs_cleanerd daemon. 1102640c19dSMauro Carvalho Chehab 1112640c19dSMauro Carvalho Chehab NILFS_IOCTL_GET_CPSTAT Return checkpoints statistics. This ioctl is 1122640c19dSMauro Carvalho Chehab used by lscp, rmcp utilities and by 1132640c19dSMauro Carvalho Chehab nilfs_cleanerd daemon. 1142640c19dSMauro Carvalho Chehab 1152640c19dSMauro Carvalho Chehab NILFS_IOCTL_GET_SUINFO Return segment usage info about requested 1162640c19dSMauro Carvalho Chehab segments. This ioctl is used in lssu, 1172640c19dSMauro Carvalho Chehab nilfs_resize utilities and by nilfs_cleanerd 1182640c19dSMauro Carvalho Chehab daemon. 1192640c19dSMauro Carvalho Chehab 1202640c19dSMauro Carvalho Chehab NILFS_IOCTL_SET_SUINFO Modify segment usage info of requested 1212640c19dSMauro Carvalho Chehab segments. This ioctl is used by 1222640c19dSMauro Carvalho Chehab nilfs_cleanerd daemon to skip unnecessary 1232640c19dSMauro Carvalho Chehab cleaning operation of segments and reduce 1242640c19dSMauro Carvalho Chehab performance penalty or wear of flash device 1252640c19dSMauro Carvalho Chehab due to redundant move of in-use blocks. 1262640c19dSMauro Carvalho Chehab 1272640c19dSMauro Carvalho Chehab NILFS_IOCTL_GET_SUSTAT Return segment usage statistics. This ioctl 1282640c19dSMauro Carvalho Chehab is used in lssu, nilfs_resize utilities and 1292640c19dSMauro Carvalho Chehab by nilfs_cleanerd daemon. 1302640c19dSMauro Carvalho Chehab 1312640c19dSMauro Carvalho Chehab NILFS_IOCTL_GET_VINFO Return information on virtual block addresses. 1322640c19dSMauro Carvalho Chehab This ioctl is used by nilfs_cleanerd daemon. 1332640c19dSMauro Carvalho Chehab 1342640c19dSMauro Carvalho Chehab NILFS_IOCTL_GET_BDESCS Return information about descriptors of disk 1352640c19dSMauro Carvalho Chehab block numbers. This ioctl is used by 1362640c19dSMauro Carvalho Chehab nilfs_cleanerd daemon. 1372640c19dSMauro Carvalho Chehab 1382640c19dSMauro Carvalho Chehab NILFS_IOCTL_CLEAN_SEGMENTS Do garbage collection operation in the 1392640c19dSMauro Carvalho Chehab environment of requested parameters from 1402640c19dSMauro Carvalho Chehab userspace. This ioctl is used by 1412640c19dSMauro Carvalho Chehab nilfs_cleanerd daemon. 1422640c19dSMauro Carvalho Chehab 1432640c19dSMauro Carvalho Chehab NILFS_IOCTL_SYNC Make a checkpoint. This ioctl is used in 1442640c19dSMauro Carvalho Chehab mkcp utility. 1452640c19dSMauro Carvalho Chehab 1462640c19dSMauro Carvalho Chehab NILFS_IOCTL_RESIZE Resize NILFS2 volume. This ioctl is used 1472640c19dSMauro Carvalho Chehab by nilfs_resize utility. 1482640c19dSMauro Carvalho Chehab 1492640c19dSMauro Carvalho Chehab NILFS_IOCTL_SET_ALLOC_RANGE Define lower limit of segments in bytes and 1502640c19dSMauro Carvalho Chehab upper limit of segments in bytes. This ioctl 1512640c19dSMauro Carvalho Chehab is used by nilfs_resize utility. 1522640c19dSMauro Carvalho Chehab ============================== =============================================== 1532640c19dSMauro Carvalho Chehab 1542640c19dSMauro Carvalho ChehabNILFS2 usage 1552640c19dSMauro Carvalho Chehab============ 1562640c19dSMauro Carvalho Chehab 1572640c19dSMauro Carvalho ChehabTo use nilfs2 as a local file system, simply:: 1582640c19dSMauro Carvalho Chehab 1592640c19dSMauro Carvalho Chehab # mkfs -t nilfs2 /dev/block_device 1602640c19dSMauro Carvalho Chehab # mount -t nilfs2 /dev/block_device /dir 1612640c19dSMauro Carvalho Chehab 1622640c19dSMauro Carvalho ChehabThis will also invoke the cleaner through the mount helper program 1632640c19dSMauro Carvalho Chehab(mount.nilfs2). 1642640c19dSMauro Carvalho Chehab 1652640c19dSMauro Carvalho ChehabCheckpoints and snapshots are managed by the following commands. 1662640c19dSMauro Carvalho ChehabTheir manpages are included in the nilfs-utils package above. 1672640c19dSMauro Carvalho Chehab 1682640c19dSMauro Carvalho Chehab ==== =========================================================== 1692640c19dSMauro Carvalho Chehab lscp list checkpoints or snapshots. 1702640c19dSMauro Carvalho Chehab mkcp make a checkpoint or a snapshot. 1712640c19dSMauro Carvalho Chehab chcp change an existing checkpoint to a snapshot or vice versa. 1722640c19dSMauro Carvalho Chehab rmcp invalidate specified checkpoint(s). 1732640c19dSMauro Carvalho Chehab ==== =========================================================== 1742640c19dSMauro Carvalho Chehab 1752640c19dSMauro Carvalho ChehabTo mount a snapshot:: 1762640c19dSMauro Carvalho Chehab 1772640c19dSMauro Carvalho Chehab # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir 1782640c19dSMauro Carvalho Chehab 1792640c19dSMauro Carvalho Chehabwhere <cno> is the checkpoint number of the snapshot. 1802640c19dSMauro Carvalho Chehab 1812640c19dSMauro Carvalho ChehabTo unmount the NILFS2 mount point or snapshot, simply:: 1822640c19dSMauro Carvalho Chehab 1832640c19dSMauro Carvalho Chehab # umount /dir 1842640c19dSMauro Carvalho Chehab 1852640c19dSMauro Carvalho ChehabThen, the cleaner daemon is automatically shut down by the umount 1862640c19dSMauro Carvalho Chehabhelper program (umount.nilfs2). 1872640c19dSMauro Carvalho Chehab 1882640c19dSMauro Carvalho ChehabDisk format 1892640c19dSMauro Carvalho Chehab=========== 1902640c19dSMauro Carvalho Chehab 1912640c19dSMauro Carvalho ChehabA nilfs2 volume is equally divided into a number of segments except 1922640c19dSMauro Carvalho Chehabfor the super block (SB) and segment #0. A segment is the container 1932640c19dSMauro Carvalho Chehabof logs. Each log is composed of summary information blocks, payload 1942640c19dSMauro Carvalho Chehabblocks, and an optional super root block (SR):: 1952640c19dSMauro Carvalho Chehab 1962640c19dSMauro Carvalho Chehab ______________________________________________________ 1972640c19dSMauro Carvalho Chehab | |SB| | Segment | Segment | Segment | ... | Segment | | 1982640c19dSMauro Carvalho Chehab |_|__|_|____0____|____1____|____2____|_____|____N____|_| 1992640c19dSMauro Carvalho Chehab 0 +1K +4K +8M +16M +24M +(8MB x N) 2002640c19dSMauro Carvalho Chehab . . (Typical offsets for 4KB-block) 2012640c19dSMauro Carvalho Chehab . . 2022640c19dSMauro Carvalho Chehab .______________________. 2032640c19dSMauro Carvalho Chehab | log | log |... | log | 2042640c19dSMauro Carvalho Chehab |__1__|__2__|____|__m__| 2052640c19dSMauro Carvalho Chehab . . 2062640c19dSMauro Carvalho Chehab . . 2072640c19dSMauro Carvalho Chehab . . 2082640c19dSMauro Carvalho Chehab .______________________________. 2092640c19dSMauro Carvalho Chehab | Summary | Payload blocks |SR| 2102640c19dSMauro Carvalho Chehab |_blocks__|_________________|__| 2112640c19dSMauro Carvalho Chehab 2122640c19dSMauro Carvalho ChehabThe payload blocks are organized per file, and each file consists of 2132640c19dSMauro Carvalho Chehabdata blocks and B-tree node blocks:: 2142640c19dSMauro Carvalho Chehab 2152640c19dSMauro Carvalho Chehab |<--- File-A --->|<--- File-B --->| 2162640c19dSMauro Carvalho Chehab _______________________________________________________________ 2172640c19dSMauro Carvalho Chehab | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ... 2182640c19dSMauro Carvalho Chehab _|_____________|_______________|_____________|_______________|_ 2192640c19dSMauro Carvalho Chehab 2202640c19dSMauro Carvalho Chehab 2212640c19dSMauro Carvalho ChehabSince only the modified blocks are written in the log, it may have 2222640c19dSMauro Carvalho Chehabfiles without data blocks or B-tree node blocks. 2232640c19dSMauro Carvalho Chehab 2242640c19dSMauro Carvalho ChehabThe organization of the blocks is recorded in the summary information 2252640c19dSMauro Carvalho Chehabblocks, which contains a header structure (nilfs_segment_summary), per 2262640c19dSMauro Carvalho Chehabfile structures (nilfs_finfo), and per block structures (nilfs_binfo):: 2272640c19dSMauro Carvalho Chehab 2282640c19dSMauro Carvalho Chehab _________________________________________________________________________ 2292640c19dSMauro Carvalho Chehab | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |... 2302640c19dSMauro Carvalho Chehab |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___ 2312640c19dSMauro Carvalho Chehab 2322640c19dSMauro Carvalho Chehab 2332640c19dSMauro Carvalho ChehabThe logs include regular files, directory files, symbolic link files 234*d56b699dSBjorn Helgaasand several meta data files. The meta data files are the files used 2352640c19dSMauro Carvalho Chehabto maintain file system meta data. The current version of NILFS2 uses 2362640c19dSMauro Carvalho Chehabthe following meta data files:: 2372640c19dSMauro Carvalho Chehab 2382640c19dSMauro Carvalho Chehab 1) Inode file (ifile) -- Stores on-disk inodes 2392640c19dSMauro Carvalho Chehab 2) Checkpoint file (cpfile) -- Stores checkpoints 2402640c19dSMauro Carvalho Chehab 3) Segment usage file (sufile) -- Stores allocation state of segments 2412640c19dSMauro Carvalho Chehab 4) Data address translation file -- Maps virtual block numbers to usual 2422640c19dSMauro Carvalho Chehab (DAT) block numbers. This file serves to 2432640c19dSMauro Carvalho Chehab make on-disk blocks relocatable. 2442640c19dSMauro Carvalho Chehab 2452640c19dSMauro Carvalho ChehabThe following figure shows a typical organization of the logs:: 2462640c19dSMauro Carvalho Chehab 2472640c19dSMauro Carvalho Chehab _________________________________________________________________________ 2482640c19dSMauro Carvalho Chehab | Summary | regular file | file | ... | ifile | cpfile | sufile | DAT |SR| 2492640c19dSMauro Carvalho Chehab |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__| 2502640c19dSMauro Carvalho Chehab 2512640c19dSMauro Carvalho Chehab 2522640c19dSMauro Carvalho ChehabTo stride over segment boundaries, this sequence of files may be split 2532640c19dSMauro Carvalho Chehabinto multiple logs. The sequence of logs that should be treated as 2542640c19dSMauro Carvalho Chehablogically one log, is delimited with flags marked in the segment 2552640c19dSMauro Carvalho Chehabsummary. The recovery code of nilfs2 looks this boundary information 2562640c19dSMauro Carvalho Chehabto ensure atomicity of updates. 2572640c19dSMauro Carvalho Chehab 2582640c19dSMauro Carvalho ChehabThe super root block is inserted for every checkpoints. It includes 2592640c19dSMauro Carvalho Chehabthree special inodes, inodes for the DAT, cpfile, and sufile. Inodes 2602640c19dSMauro Carvalho Chehabof regular files, directories, symlinks and other special files, are 2612640c19dSMauro Carvalho Chehabincluded in the ifile. The inode of ifile itself is included in the 2622640c19dSMauro Carvalho Chehabcorresponding checkpoint entry in the cpfile. Thus, the hierarchy 2632640c19dSMauro Carvalho Chehabamong NILFS2 files can be depicted as follows:: 2642640c19dSMauro Carvalho Chehab 2652640c19dSMauro Carvalho Chehab Super block (SB) 2662640c19dSMauro Carvalho Chehab | 2672640c19dSMauro Carvalho Chehab v 2682640c19dSMauro Carvalho Chehab Super root block (the latest cno=xx) 2692640c19dSMauro Carvalho Chehab |-- DAT 2702640c19dSMauro Carvalho Chehab |-- sufile 2712640c19dSMauro Carvalho Chehab `-- cpfile 2722640c19dSMauro Carvalho Chehab |-- ifile (cno=c1) 2732640c19dSMauro Carvalho Chehab |-- ifile (cno=c2) ---- file (ino=i1) 2742640c19dSMauro Carvalho Chehab : : |-- file (ino=i2) 2752640c19dSMauro Carvalho Chehab `-- ifile (cno=xx) |-- file (ino=i3) 2762640c19dSMauro Carvalho Chehab : : 2772640c19dSMauro Carvalho Chehab `-- file (ino=yy) 2782640c19dSMauro Carvalho Chehab ( regular file, directory, or symlink ) 2792640c19dSMauro Carvalho Chehab 2802640c19dSMauro Carvalho ChehabFor detail on the format of each file, please see nilfs2_ondisk.h 2812640c19dSMauro Carvalho Chehablocated at include/uapi/linux directory. 2822640c19dSMauro Carvalho Chehab 2832640c19dSMauro Carvalho ChehabThere are no patents or other intellectual property that we protect 2842640c19dSMauro Carvalho Chehabwith regard to the design of NILFS2. It is allowed to replicate the 2852640c19dSMauro Carvalho Chehabdesign in hopes that other operating systems could share (mount, read, 2862640c19dSMauro Carvalho Chehabwrite, etc.) data stored in this format. 287