1*6ff2deb2SEric Biggers.. SPDX-License-Identifier: GPL-2.0 2*6ff2deb2SEric Biggers 3*6ff2deb2SEric Biggers.. _fsverity: 4*6ff2deb2SEric Biggers 5*6ff2deb2SEric Biggers======================================================= 6*6ff2deb2SEric Biggersfs-verity: read-only file-based authenticity protection 7*6ff2deb2SEric Biggers======================================================= 8*6ff2deb2SEric Biggers 9*6ff2deb2SEric BiggersIntroduction 10*6ff2deb2SEric Biggers============ 11*6ff2deb2SEric Biggers 12*6ff2deb2SEric Biggersfs-verity (``fs/verity/``) is a support layer that filesystems can 13*6ff2deb2SEric Biggershook into to support transparent integrity and authenticity protection 14*6ff2deb2SEric Biggersof read-only files. Currently, it is supported by the ext4 and f2fs 15*6ff2deb2SEric Biggersfilesystems. Like fscrypt, not too much filesystem-specific code is 16*6ff2deb2SEric Biggersneeded to support fs-verity. 17*6ff2deb2SEric Biggers 18*6ff2deb2SEric Biggersfs-verity is similar to `dm-verity 19*6ff2deb2SEric Biggers<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_ 20*6ff2deb2SEric Biggersbut works on files rather than block devices. On regular files on 21*6ff2deb2SEric Biggersfilesystems supporting fs-verity, userspace can execute an ioctl that 22*6ff2deb2SEric Biggerscauses the filesystem to build a Merkle tree for the file and persist 23*6ff2deb2SEric Biggersit to a filesystem-specific location associated with the file. 24*6ff2deb2SEric Biggers 25*6ff2deb2SEric BiggersAfter this, the file is made readonly, and all reads from the file are 26*6ff2deb2SEric Biggersautomatically verified against the file's Merkle tree. Reads of any 27*6ff2deb2SEric Biggerscorrupted data, including mmap reads, will fail. 28*6ff2deb2SEric Biggers 29*6ff2deb2SEric BiggersUserspace can use another ioctl to retrieve the root hash (actually 30*6ff2deb2SEric Biggersthe "file measurement", which is a hash that includes the root hash) 31*6ff2deb2SEric Biggersthat fs-verity is enforcing for the file. This ioctl executes in 32*6ff2deb2SEric Biggersconstant time, regardless of the file size. 33*6ff2deb2SEric Biggers 34*6ff2deb2SEric Biggersfs-verity is essentially a way to hash a file in constant time, 35*6ff2deb2SEric Biggerssubject to the caveat that reads which would violate the hash will 36*6ff2deb2SEric Biggersfail at runtime. 37*6ff2deb2SEric Biggers 38*6ff2deb2SEric BiggersUse cases 39*6ff2deb2SEric Biggers========= 40*6ff2deb2SEric Biggers 41*6ff2deb2SEric BiggersBy itself, the base fs-verity feature only provides integrity 42*6ff2deb2SEric Biggersprotection, i.e. detection of accidental (non-malicious) corruption. 43*6ff2deb2SEric Biggers 44*6ff2deb2SEric BiggersHowever, because fs-verity makes retrieving the file hash extremely 45*6ff2deb2SEric Biggersefficient, it's primarily meant to be used as a tool to support 46*6ff2deb2SEric Biggersauthentication (detection of malicious modifications) or auditing 47*6ff2deb2SEric Biggers(logging file hashes before use). 48*6ff2deb2SEric Biggers 49*6ff2deb2SEric BiggersTrusted userspace code (e.g. operating system code running on a 50*6ff2deb2SEric Biggersread-only partition that is itself authenticated by dm-verity) can 51*6ff2deb2SEric Biggersauthenticate the contents of an fs-verity file by using the 52*6ff2deb2SEric Biggers`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a 53*6ff2deb2SEric Biggersdigital signature of it. 54*6ff2deb2SEric Biggers 55*6ff2deb2SEric BiggersA standard file hash could be used instead of fs-verity. However, 56*6ff2deb2SEric Biggersthis is inefficient if the file is large and only a small portion may 57*6ff2deb2SEric Biggersbe accessed. This is often the case for Android application package 58*6ff2deb2SEric Biggers(APK) files, for example. These typically contain many translations, 59*6ff2deb2SEric Biggersclasses, and other resources that are infrequently or even never 60*6ff2deb2SEric Biggersaccessed on a particular device. It would be slow and wasteful to 61*6ff2deb2SEric Biggersread and hash the entire file before starting the application. 62*6ff2deb2SEric Biggers 63*6ff2deb2SEric BiggersUnlike an ahead-of-time hash, fs-verity also re-verifies data each 64*6ff2deb2SEric Biggerstime it's paged in. This ensures that malicious disk firmware can't 65*6ff2deb2SEric Biggersundetectably change the contents of the file at runtime. 66*6ff2deb2SEric Biggers 67*6ff2deb2SEric Biggersfs-verity does not replace or obsolete dm-verity. dm-verity should 68*6ff2deb2SEric Biggersstill be used on read-only filesystems. fs-verity is for files that 69*6ff2deb2SEric Biggersmust live on a read-write filesystem because they are independently 70*6ff2deb2SEric Biggersupdated and potentially user-installed, so dm-verity cannot be used. 71*6ff2deb2SEric Biggers 72*6ff2deb2SEric BiggersThe base fs-verity feature is a hashing mechanism only; actually 73*6ff2deb2SEric Biggersauthenticating the files is up to userspace. However, to meet some 74*6ff2deb2SEric Biggersusers' needs, fs-verity optionally supports a simple signature 75*6ff2deb2SEric Biggersverification mechanism where users can configure the kernel to require 76*6ff2deb2SEric Biggersthat all fs-verity files be signed by a key loaded into a keyring; see 77*6ff2deb2SEric Biggers`Built-in signature verification`_. Support for fs-verity file hashes 78*6ff2deb2SEric Biggersin IMA (Integrity Measurement Architecture) policies is also planned. 79*6ff2deb2SEric Biggers 80*6ff2deb2SEric BiggersUser API 81*6ff2deb2SEric Biggers======== 82*6ff2deb2SEric Biggers 83*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY 84*6ff2deb2SEric Biggers-------------------- 85*6ff2deb2SEric Biggers 86*6ff2deb2SEric BiggersThe FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file. It takes 87*6ff2deb2SEric Biggersin a pointer to a :c:type:`struct fsverity_enable_arg`, defined as 88*6ff2deb2SEric Biggersfollows:: 89*6ff2deb2SEric Biggers 90*6ff2deb2SEric Biggers struct fsverity_enable_arg { 91*6ff2deb2SEric Biggers __u32 version; 92*6ff2deb2SEric Biggers __u32 hash_algorithm; 93*6ff2deb2SEric Biggers __u32 block_size; 94*6ff2deb2SEric Biggers __u32 salt_size; 95*6ff2deb2SEric Biggers __u64 salt_ptr; 96*6ff2deb2SEric Biggers __u32 sig_size; 97*6ff2deb2SEric Biggers __u32 __reserved1; 98*6ff2deb2SEric Biggers __u64 sig_ptr; 99*6ff2deb2SEric Biggers __u64 __reserved2[11]; 100*6ff2deb2SEric Biggers }; 101*6ff2deb2SEric Biggers 102*6ff2deb2SEric BiggersThis structure contains the parameters of the Merkle tree to build for 103*6ff2deb2SEric Biggersthe file, and optionally contains a signature. It must be initialized 104*6ff2deb2SEric Biggersas follows: 105*6ff2deb2SEric Biggers 106*6ff2deb2SEric Biggers- ``version`` must be 1. 107*6ff2deb2SEric Biggers- ``hash_algorithm`` must be the identifier for the hash algorithm to 108*6ff2deb2SEric Biggers use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256. See 109*6ff2deb2SEric Biggers ``include/uapi/linux/fsverity.h`` for the list of possible values. 110*6ff2deb2SEric Biggers- ``block_size`` must be the Merkle tree block size. Currently, this 111*6ff2deb2SEric Biggers must be equal to the system page size, which is usually 4096 bytes. 112*6ff2deb2SEric Biggers Other sizes may be supported in the future. This value is not 113*6ff2deb2SEric Biggers necessarily the same as the filesystem block size. 114*6ff2deb2SEric Biggers- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is 115*6ff2deb2SEric Biggers provided. The salt is a value that is prepended to every hashed 116*6ff2deb2SEric Biggers block; it can be used to personalize the hashing for a particular 117*6ff2deb2SEric Biggers file or device. Currently the maximum salt size is 32 bytes. 118*6ff2deb2SEric Biggers- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is 119*6ff2deb2SEric Biggers provided. 120*6ff2deb2SEric Biggers- ``sig_size`` is the size of the signature in bytes, or 0 if no 121*6ff2deb2SEric Biggers signature is provided. Currently the signature is (somewhat 122*6ff2deb2SEric Biggers arbitrarily) limited to 16128 bytes. See `Built-in signature 123*6ff2deb2SEric Biggers verification`_ for more information. 124*6ff2deb2SEric Biggers- ``sig_ptr`` is the pointer to the signature, or NULL if no 125*6ff2deb2SEric Biggers signature is provided. 126*6ff2deb2SEric Biggers- All reserved fields must be zeroed. 127*6ff2deb2SEric Biggers 128*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for 129*6ff2deb2SEric Biggersthe file and persist it to a filesystem-specific location associated 130*6ff2deb2SEric Biggerswith the file, then mark the file as a verity file. This ioctl may 131*6ff2deb2SEric Biggerstake a long time to execute on large files, and it is interruptible by 132*6ff2deb2SEric Biggersfatal signals. 133*6ff2deb2SEric Biggers 134*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY checks for write access to the inode. However, 135*6ff2deb2SEric Biggersit must be executed on an O_RDONLY file descriptor and no processes 136*6ff2deb2SEric Biggerscan have the file open for writing. Attempts to open the file for 137*6ff2deb2SEric Biggerswriting while this ioctl is executing will fail with ETXTBSY. (This 138*6ff2deb2SEric Biggersis necessary to guarantee that no writable file descriptors will exist 139*6ff2deb2SEric Biggersafter verity is enabled, and to guarantee that the file's contents are 140*6ff2deb2SEric Biggersstable while the Merkle tree is being built over it.) 141*6ff2deb2SEric Biggers 142*6ff2deb2SEric BiggersOn success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a 143*6ff2deb2SEric Biggersverity file. On failure (including the case of interruption by a 144*6ff2deb2SEric Biggersfatal signal), no changes are made to the file. 145*6ff2deb2SEric Biggers 146*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY can fail with the following errors: 147*6ff2deb2SEric Biggers 148*6ff2deb2SEric Biggers- ``EACCES``: the process does not have write access to the file 149*6ff2deb2SEric Biggers- ``EBADMSG``: the signature is malformed 150*6ff2deb2SEric Biggers- ``EBUSY``: this ioctl is already running on the file 151*6ff2deb2SEric Biggers- ``EEXIST``: the file already has verity enabled 152*6ff2deb2SEric Biggers- ``EFAULT``: the caller provided inaccessible memory 153*6ff2deb2SEric Biggers- ``EINTR``: the operation was interrupted by a fatal signal 154*6ff2deb2SEric Biggers- ``EINVAL``: unsupported version, hash algorithm, or block size; or 155*6ff2deb2SEric Biggers reserved bits are set; or the file descriptor refers to neither a 156*6ff2deb2SEric Biggers regular file nor a directory. 157*6ff2deb2SEric Biggers- ``EISDIR``: the file descriptor refers to a directory 158*6ff2deb2SEric Biggers- ``EKEYREJECTED``: the signature doesn't match the file 159*6ff2deb2SEric Biggers- ``EMSGSIZE``: the salt or signature is too long 160*6ff2deb2SEric Biggers- ``ENOKEY``: the fs-verity keyring doesn't contain the certificate 161*6ff2deb2SEric Biggers needed to verify the signature 162*6ff2deb2SEric Biggers- ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not 163*6ff2deb2SEric Biggers available in the kernel's crypto API as currently configured (e.g. 164*6ff2deb2SEric Biggers for SHA-512, missing CONFIG_CRYPTO_SHA512). 165*6ff2deb2SEric Biggers- ``ENOTTY``: this type of filesystem does not implement fs-verity 166*6ff2deb2SEric Biggers- ``EOPNOTSUPP``: the kernel was not configured with fs-verity 167*6ff2deb2SEric Biggers support; or the filesystem superblock has not had the 'verity' 168*6ff2deb2SEric Biggers feature enabled on it; or the filesystem does not support fs-verity 169*6ff2deb2SEric Biggers on this file. (See `Filesystem support`_.) 170*6ff2deb2SEric Biggers- ``EPERM``: the file is append-only; or, a signature is required and 171*6ff2deb2SEric Biggers one was not provided. 172*6ff2deb2SEric Biggers- ``EROFS``: the filesystem is read-only 173*6ff2deb2SEric Biggers- ``ETXTBSY``: someone has the file open for writing. This can be the 174*6ff2deb2SEric Biggers caller's file descriptor, another open file descriptor, or the file 175*6ff2deb2SEric Biggers reference held by a writable memory map. 176*6ff2deb2SEric Biggers 177*6ff2deb2SEric BiggersFS_IOC_MEASURE_VERITY 178*6ff2deb2SEric Biggers--------------------- 179*6ff2deb2SEric Biggers 180*6ff2deb2SEric BiggersThe FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity 181*6ff2deb2SEric Biggersfile. The file measurement is a digest that cryptographically 182*6ff2deb2SEric Biggersidentifies the file contents that are being enforced on reads. 183*6ff2deb2SEric Biggers 184*6ff2deb2SEric BiggersThis ioctl takes in a pointer to a variable-length structure:: 185*6ff2deb2SEric Biggers 186*6ff2deb2SEric Biggers struct fsverity_digest { 187*6ff2deb2SEric Biggers __u16 digest_algorithm; 188*6ff2deb2SEric Biggers __u16 digest_size; /* input/output */ 189*6ff2deb2SEric Biggers __u8 digest[]; 190*6ff2deb2SEric Biggers }; 191*6ff2deb2SEric Biggers 192*6ff2deb2SEric Biggers``digest_size`` is an input/output field. On input, it must be 193*6ff2deb2SEric Biggersinitialized to the number of bytes allocated for the variable-length 194*6ff2deb2SEric Biggers``digest`` field. 195*6ff2deb2SEric Biggers 196*6ff2deb2SEric BiggersOn success, 0 is returned and the kernel fills in the structure as 197*6ff2deb2SEric Biggersfollows: 198*6ff2deb2SEric Biggers 199*6ff2deb2SEric Biggers- ``digest_algorithm`` will be the hash algorithm used for the file 200*6ff2deb2SEric Biggers measurement. It will match ``fsverity_enable_arg::hash_algorithm``. 201*6ff2deb2SEric Biggers- ``digest_size`` will be the size of the digest in bytes, e.g. 32 202*6ff2deb2SEric Biggers for SHA-256. (This can be redundant with ``digest_algorithm``.) 203*6ff2deb2SEric Biggers- ``digest`` will be the actual bytes of the digest. 204*6ff2deb2SEric Biggers 205*6ff2deb2SEric BiggersFS_IOC_MEASURE_VERITY is guaranteed to execute in constant time, 206*6ff2deb2SEric Biggersregardless of the size of the file. 207*6ff2deb2SEric Biggers 208*6ff2deb2SEric BiggersFS_IOC_MEASURE_VERITY can fail with the following errors: 209*6ff2deb2SEric Biggers 210*6ff2deb2SEric Biggers- ``EFAULT``: the caller provided inaccessible memory 211*6ff2deb2SEric Biggers- ``ENODATA``: the file is not a verity file 212*6ff2deb2SEric Biggers- ``ENOTTY``: this type of filesystem does not implement fs-verity 213*6ff2deb2SEric Biggers- ``EOPNOTSUPP``: the kernel was not configured with fs-verity 214*6ff2deb2SEric Biggers support, or the filesystem superblock has not had the 'verity' 215*6ff2deb2SEric Biggers feature enabled on it. (See `Filesystem support`_.) 216*6ff2deb2SEric Biggers- ``EOVERFLOW``: the digest is longer than the specified 217*6ff2deb2SEric Biggers ``digest_size`` bytes. Try providing a larger buffer. 218*6ff2deb2SEric Biggers 219*6ff2deb2SEric BiggersFS_IOC_GETFLAGS 220*6ff2deb2SEric Biggers--------------- 221*6ff2deb2SEric Biggers 222*6ff2deb2SEric BiggersThe existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity) 223*6ff2deb2SEric Biggerscan also be used to check whether a file has fs-verity enabled or not. 224*6ff2deb2SEric BiggersTo do so, check for FS_VERITY_FL (0x00100000) in the returned flags. 225*6ff2deb2SEric Biggers 226*6ff2deb2SEric BiggersThe verity flag is not settable via FS_IOC_SETFLAGS. You must use 227*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY instead, since parameters must be provided. 228*6ff2deb2SEric Biggers 229*6ff2deb2SEric BiggersAccessing verity files 230*6ff2deb2SEric Biggers====================== 231*6ff2deb2SEric Biggers 232*6ff2deb2SEric BiggersApplications can transparently access a verity file just like a 233*6ff2deb2SEric Biggersnon-verity one, with the following exceptions: 234*6ff2deb2SEric Biggers 235*6ff2deb2SEric Biggers- Verity files are readonly. They cannot be opened for writing or 236*6ff2deb2SEric Biggers truncate()d, even if the file mode bits allow it. Attempts to do 237*6ff2deb2SEric Biggers one of these things will fail with EPERM. However, changes to 238*6ff2deb2SEric Biggers metadata such as owner, mode, timestamps, and xattrs are still 239*6ff2deb2SEric Biggers allowed, since these are not measured by fs-verity. Verity files 240*6ff2deb2SEric Biggers can also still be renamed, deleted, and linked to. 241*6ff2deb2SEric Biggers 242*6ff2deb2SEric Biggers- Direct I/O is not supported on verity files. Attempts to use direct 243*6ff2deb2SEric Biggers I/O on such files will fall back to buffered I/O. 244*6ff2deb2SEric Biggers 245*6ff2deb2SEric Biggers- DAX (Direct Access) is not supported on verity files, because this 246*6ff2deb2SEric Biggers would circumvent the data verification. 247*6ff2deb2SEric Biggers 248*6ff2deb2SEric Biggers- Reads of data that doesn't match the verity Merkle tree will fail 249*6ff2deb2SEric Biggers with EIO (for read()) or SIGBUS (for mmap() reads). 250*6ff2deb2SEric Biggers 251*6ff2deb2SEric Biggers- If the sysctl "fs.verity.require_signatures" is set to 1 and the 252*6ff2deb2SEric Biggers file's verity measurement is not signed by a key in the fs-verity 253*6ff2deb2SEric Biggers keyring, then opening the file will fail. See `Built-in signature 254*6ff2deb2SEric Biggers verification`_. 255*6ff2deb2SEric Biggers 256*6ff2deb2SEric BiggersDirect access to the Merkle tree is not supported. Therefore, if a 257*6ff2deb2SEric Biggersverity file is copied, or is backed up and restored, then it will lose 258*6ff2deb2SEric Biggersits "verity"-ness. fs-verity is primarily meant for files like 259*6ff2deb2SEric Biggersexecutables that are managed by a package manager. 260*6ff2deb2SEric Biggers 261*6ff2deb2SEric BiggersFile measurement computation 262*6ff2deb2SEric Biggers============================ 263*6ff2deb2SEric Biggers 264*6ff2deb2SEric BiggersThis section describes how fs-verity hashes the file contents using a 265*6ff2deb2SEric BiggersMerkle tree to produce the "file measurement" which cryptographically 266*6ff2deb2SEric Biggersidentifies the file contents. This algorithm is the same for all 267*6ff2deb2SEric Biggersfilesystems that support fs-verity. 268*6ff2deb2SEric Biggers 269*6ff2deb2SEric BiggersUserspace only needs to be aware of this algorithm if it needs to 270*6ff2deb2SEric Biggerscompute the file measurement itself, e.g. in order to sign the file. 271*6ff2deb2SEric Biggers 272*6ff2deb2SEric Biggers.. _fsverity_merkle_tree: 273*6ff2deb2SEric Biggers 274*6ff2deb2SEric BiggersMerkle tree 275*6ff2deb2SEric Biggers----------- 276*6ff2deb2SEric Biggers 277*6ff2deb2SEric BiggersThe file contents is divided into blocks, where the block size is 278*6ff2deb2SEric Biggersconfigurable but is usually 4096 bytes. The end of the last block is 279*6ff2deb2SEric Biggerszero-padded if needed. Each block is then hashed, producing the first 280*6ff2deb2SEric Biggerslevel of hashes. Then, the hashes in this first level are grouped 281*6ff2deb2SEric Biggersinto 'blocksize'-byte blocks (zero-padding the ends as needed) and 282*6ff2deb2SEric Biggersthese blocks are hashed, producing the second level of hashes. This 283*6ff2deb2SEric Biggersproceeds up the tree until only a single block remains. The hash of 284*6ff2deb2SEric Biggersthis block is the "Merkle tree root hash". 285*6ff2deb2SEric Biggers 286*6ff2deb2SEric BiggersIf the file fits in one block and is nonempty, then the "Merkle tree 287*6ff2deb2SEric Biggersroot hash" is simply the hash of the single data block. If the file 288*6ff2deb2SEric Biggersis empty, then the "Merkle tree root hash" is all zeroes. 289*6ff2deb2SEric Biggers 290*6ff2deb2SEric BiggersThe "blocks" here are not necessarily the same as "filesystem blocks". 291*6ff2deb2SEric Biggers 292*6ff2deb2SEric BiggersIf a salt was specified, then it's zero-padded to the closest multiple 293*6ff2deb2SEric Biggersof the input size of the hash algorithm's compression function, e.g. 294*6ff2deb2SEric Biggers64 bytes for SHA-256 or 128 bytes for SHA-512. The padded salt is 295*6ff2deb2SEric Biggersprepended to every data or Merkle tree block that is hashed. 296*6ff2deb2SEric Biggers 297*6ff2deb2SEric BiggersThe purpose of the block padding is to cause every hash to be taken 298*6ff2deb2SEric Biggersover the same amount of data, which simplifies the implementation and 299*6ff2deb2SEric Biggerskeeps open more possibilities for hardware acceleration. The purpose 300*6ff2deb2SEric Biggersof the salt padding is to make the salting "free" when the salted hash 301*6ff2deb2SEric Biggersstate is precomputed, then imported for each hash. 302*6ff2deb2SEric Biggers 303*6ff2deb2SEric BiggersExample: in the recommended configuration of SHA-256 and 4K blocks, 304*6ff2deb2SEric Biggers128 hash values fit in each block. Thus, each level of the Merkle 305*6ff2deb2SEric Biggerstree is approximately 128 times smaller than the previous, and for 306*6ff2deb2SEric Biggerslarge files the Merkle tree's size converges to approximately 1/127 of 307*6ff2deb2SEric Biggersthe original file size. However, for small files, the padding is 308*6ff2deb2SEric Biggerssignificant, making the space overhead proportionally more. 309*6ff2deb2SEric Biggers 310*6ff2deb2SEric Biggers.. _fsverity_descriptor: 311*6ff2deb2SEric Biggers 312*6ff2deb2SEric Biggersfs-verity descriptor 313*6ff2deb2SEric Biggers-------------------- 314*6ff2deb2SEric Biggers 315*6ff2deb2SEric BiggersBy itself, the Merkle tree root hash is ambiguous. For example, it 316*6ff2deb2SEric Biggerscan't a distinguish a large file from a small second file whose data 317*6ff2deb2SEric Biggersis exactly the top-level hash block of the first file. Ambiguities 318*6ff2deb2SEric Biggersalso arise from the convention of padding to the next block boundary. 319*6ff2deb2SEric Biggers 320*6ff2deb2SEric BiggersTo solve this problem, the verity file measurement is actually 321*6ff2deb2SEric Biggerscomputed as a hash of the following structure, which contains the 322*6ff2deb2SEric BiggersMerkle tree root hash as well as other fields such as the file size:: 323*6ff2deb2SEric Biggers 324*6ff2deb2SEric Biggers struct fsverity_descriptor { 325*6ff2deb2SEric Biggers __u8 version; /* must be 1 */ 326*6ff2deb2SEric Biggers __u8 hash_algorithm; /* Merkle tree hash algorithm */ 327*6ff2deb2SEric Biggers __u8 log_blocksize; /* log2 of size of data and tree blocks */ 328*6ff2deb2SEric Biggers __u8 salt_size; /* size of salt in bytes; 0 if none */ 329*6ff2deb2SEric Biggers __le32 sig_size; /* must be 0 */ 330*6ff2deb2SEric Biggers __le64 data_size; /* size of file the Merkle tree is built over */ 331*6ff2deb2SEric Biggers __u8 root_hash[64]; /* Merkle tree root hash */ 332*6ff2deb2SEric Biggers __u8 salt[32]; /* salt prepended to each hashed block */ 333*6ff2deb2SEric Biggers __u8 __reserved[144]; /* must be 0's */ 334*6ff2deb2SEric Biggers }; 335*6ff2deb2SEric Biggers 336*6ff2deb2SEric BiggersNote that the ``sig_size`` field must be set to 0 for the purpose of 337*6ff2deb2SEric Biggerscomputing the file measurement, even if a signature was provided (or 338*6ff2deb2SEric Biggerswill be provided) to `FS_IOC_ENABLE_VERITY`_. 339*6ff2deb2SEric Biggers 340*6ff2deb2SEric BiggersBuilt-in signature verification 341*6ff2deb2SEric Biggers=============================== 342*6ff2deb2SEric Biggers 343*6ff2deb2SEric BiggersWith CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting 344*6ff2deb2SEric Biggersa portion of an authentication policy (see `Use cases`_) in the 345*6ff2deb2SEric Biggerskernel. Specifically, it adds support for: 346*6ff2deb2SEric Biggers 347*6ff2deb2SEric Biggers1. At fs-verity module initialization time, a keyring ".fs-verity" is 348*6ff2deb2SEric Biggers created. The root user can add trusted X.509 certificates to this 349*6ff2deb2SEric Biggers keyring using the add_key() system call, then (when done) 350*6ff2deb2SEric Biggers optionally use keyctl_restrict_keyring() to prevent additional 351*6ff2deb2SEric Biggers certificates from being added. 352*6ff2deb2SEric Biggers 353*6ff2deb2SEric Biggers2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted 354*6ff2deb2SEric Biggers detached signature in DER format of the file measurement. On 355*6ff2deb2SEric Biggers success, this signature is persisted alongside the Merkle tree. 356*6ff2deb2SEric Biggers Then, any time the file is opened, the kernel will verify the 357*6ff2deb2SEric Biggers file's actual measurement against this signature, using the 358*6ff2deb2SEric Biggers certificates in the ".fs-verity" keyring. 359*6ff2deb2SEric Biggers 360*6ff2deb2SEric Biggers3. A new sysctl "fs.verity.require_signatures" is made available. 361*6ff2deb2SEric Biggers When set to 1, the kernel requires that all verity files have a 362*6ff2deb2SEric Biggers correctly signed file measurement as described in (2). 363*6ff2deb2SEric Biggers 364*6ff2deb2SEric BiggersFile measurements must be signed in the following format, which is 365*6ff2deb2SEric Biggerssimilar to the structure used by `FS_IOC_MEASURE_VERITY`_:: 366*6ff2deb2SEric Biggers 367*6ff2deb2SEric Biggers struct fsverity_signed_digest { 368*6ff2deb2SEric Biggers char magic[8]; /* must be "FSVerity" */ 369*6ff2deb2SEric Biggers __le16 digest_algorithm; 370*6ff2deb2SEric Biggers __le16 digest_size; 371*6ff2deb2SEric Biggers __u8 digest[]; 372*6ff2deb2SEric Biggers }; 373*6ff2deb2SEric Biggers 374*6ff2deb2SEric Biggersfs-verity's built-in signature verification support is meant as a 375*6ff2deb2SEric Biggersrelatively simple mechanism that can be used to provide some level of 376*6ff2deb2SEric Biggersauthenticity protection for verity files, as an alternative to doing 377*6ff2deb2SEric Biggersthe signature verification in userspace or using IMA-appraisal. 378*6ff2deb2SEric BiggersHowever, with this mechanism, userspace programs still need to check 379*6ff2deb2SEric Biggersthat the verity bit is set, and there is no protection against verity 380*6ff2deb2SEric Biggersfiles being swapped around. 381*6ff2deb2SEric Biggers 382*6ff2deb2SEric BiggersFilesystem support 383*6ff2deb2SEric Biggers================== 384*6ff2deb2SEric Biggers 385*6ff2deb2SEric Biggersfs-verity is currently supported by the ext4 and f2fs filesystems. 386*6ff2deb2SEric BiggersThe CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity 387*6ff2deb2SEric Biggerson either filesystem. 388*6ff2deb2SEric Biggers 389*6ff2deb2SEric Biggers``include/linux/fsverity.h`` declares the interface between the 390*6ff2deb2SEric Biggers``fs/verity/`` support layer and filesystems. Briefly, filesystems 391*6ff2deb2SEric Biggersmust provide an ``fsverity_operations`` structure that provides 392*6ff2deb2SEric Biggersmethods to read and write the verity metadata to a filesystem-specific 393*6ff2deb2SEric Biggerslocation, including the Merkle tree blocks and 394*6ff2deb2SEric Biggers``fsverity_descriptor``. Filesystems must also call functions in 395*6ff2deb2SEric Biggers``fs/verity/`` at certain times, such as when a file is opened or when 396*6ff2deb2SEric Biggerspages have been read into the pagecache. (See `Verifying data`_.) 397*6ff2deb2SEric Biggers 398*6ff2deb2SEric Biggersext4 399*6ff2deb2SEric Biggers---- 400*6ff2deb2SEric Biggers 401*6ff2deb2SEric Biggersext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2. 402*6ff2deb2SEric Biggers 403*6ff2deb2SEric BiggersTo create verity files on an ext4 filesystem, the filesystem must have 404*6ff2deb2SEric Biggersbeen formatted with ``-O verity`` or had ``tune2fs -O verity`` run on 405*6ff2deb2SEric Biggersit. "verity" is an RO_COMPAT filesystem feature, so once set, old 406*6ff2deb2SEric Biggerskernels will only be able to mount the filesystem readonly, and old 407*6ff2deb2SEric Biggersversions of e2fsck will be unable to check the filesystem. Moreover, 408*6ff2deb2SEric Biggerscurrently ext4 only supports mounting a filesystem with the "verity" 409*6ff2deb2SEric Biggersfeature when its block size is equal to PAGE_SIZE (often 4096 bytes). 410*6ff2deb2SEric Biggers 411*6ff2deb2SEric Biggersext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files. It 412*6ff2deb2SEric Biggerscan only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared. 413*6ff2deb2SEric Biggers 414*6ff2deb2SEric Biggersext4 also supports encryption, which can be used simultaneously with 415*6ff2deb2SEric Biggersfs-verity. In this case, the plaintext data is verified rather than 416*6ff2deb2SEric Biggersthe ciphertext. This is necessary in order to make the file 417*6ff2deb2SEric Biggersmeasurement meaningful, since every file is encrypted differently. 418*6ff2deb2SEric Biggers 419*6ff2deb2SEric Biggersext4 stores the verity metadata (Merkle tree and fsverity_descriptor) 420*6ff2deb2SEric Biggerspast the end of the file, starting at the first 64K boundary beyond 421*6ff2deb2SEric Biggersi_size. This approach works because (a) verity files are readonly, 422*6ff2deb2SEric Biggersand (b) pages fully beyond i_size aren't visible to userspace but can 423*6ff2deb2SEric Biggersbe read/written internally by ext4 with only some relatively small 424*6ff2deb2SEric Biggerschanges to ext4. This approach avoids having to depend on the 425*6ff2deb2SEric BiggersEA_INODE feature and on rearchitecturing ext4's xattr support to 426*6ff2deb2SEric Biggerssupport paging multi-gigabyte xattrs into memory, and to support 427*6ff2deb2SEric Biggersencrypting xattrs. Note that the verity metadata *must* be encrypted 428*6ff2deb2SEric Biggerswhen the file is, since it contains hashes of the plaintext data. 429*6ff2deb2SEric Biggers 430*6ff2deb2SEric BiggersCurrently, ext4 verity only supports the case where the Merkle tree 431*6ff2deb2SEric Biggersblock size, filesystem block size, and page size are all the same. It 432*6ff2deb2SEric Biggersalso only supports extent-based files. 433*6ff2deb2SEric Biggers 434*6ff2deb2SEric Biggersf2fs 435*6ff2deb2SEric Biggers---- 436*6ff2deb2SEric Biggers 437*6ff2deb2SEric Biggersf2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0. 438*6ff2deb2SEric Biggers 439*6ff2deb2SEric BiggersTo create verity files on an f2fs filesystem, the filesystem must have 440*6ff2deb2SEric Biggersbeen formatted with ``-O verity``. 441*6ff2deb2SEric Biggers 442*6ff2deb2SEric Biggersf2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files. 443*6ff2deb2SEric BiggersIt can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be 444*6ff2deb2SEric Biggerscleared. 445*6ff2deb2SEric Biggers 446*6ff2deb2SEric BiggersLike ext4, f2fs stores the verity metadata (Merkle tree and 447*6ff2deb2SEric Biggersfsverity_descriptor) past the end of the file, starting at the first 448*6ff2deb2SEric Biggers64K boundary beyond i_size. See explanation for ext4 above. 449*6ff2deb2SEric BiggersMoreover, f2fs supports at most 4096 bytes of xattr entries per inode 450*6ff2deb2SEric Biggerswhich wouldn't be enough for even a single Merkle tree block. 451*6ff2deb2SEric Biggers 452*6ff2deb2SEric BiggersCurrently, f2fs verity only supports a Merkle tree block size of 4096. 453*6ff2deb2SEric BiggersAlso, f2fs doesn't support enabling verity on files that currently 454*6ff2deb2SEric Biggershave atomic or volatile writes pending. 455*6ff2deb2SEric Biggers 456*6ff2deb2SEric BiggersImplementation details 457*6ff2deb2SEric Biggers====================== 458*6ff2deb2SEric Biggers 459*6ff2deb2SEric BiggersVerifying data 460*6ff2deb2SEric Biggers-------------- 461*6ff2deb2SEric Biggers 462*6ff2deb2SEric Biggersfs-verity ensures that all reads of a verity file's data are verified, 463*6ff2deb2SEric Biggersregardless of which syscall is used to do the read (e.g. mmap(), 464*6ff2deb2SEric Biggersread(), pread()) and regardless of whether it's the first read or a 465*6ff2deb2SEric Biggerslater read (unless the later read can return cached data that was 466*6ff2deb2SEric Biggersalready verified). Below, we describe how filesystems implement this. 467*6ff2deb2SEric Biggers 468*6ff2deb2SEric BiggersPagecache 469*6ff2deb2SEric Biggers~~~~~~~~~ 470*6ff2deb2SEric Biggers 471*6ff2deb2SEric BiggersFor filesystems using Linux's pagecache, the ``->readpage()`` and 472*6ff2deb2SEric Biggers``->readpages()`` methods must be modified to verify pages before they 473*6ff2deb2SEric Biggersare marked Uptodate. Merely hooking ``->read_iter()`` would be 474*6ff2deb2SEric Biggersinsufficient, since ``->read_iter()`` is not used for memory maps. 475*6ff2deb2SEric Biggers 476*6ff2deb2SEric BiggersTherefore, fs/verity/ provides a function fsverity_verify_page() which 477*6ff2deb2SEric Biggersverifies a page that has been read into the pagecache of a verity 478*6ff2deb2SEric Biggersinode, but is still locked and not Uptodate, so it's not yet readable 479*6ff2deb2SEric Biggersby userspace. As needed to do the verification, 480*6ff2deb2SEric Biggersfsverity_verify_page() will call back into the filesystem to read 481*6ff2deb2SEric BiggersMerkle tree pages via fsverity_operations::read_merkle_tree_page(). 482*6ff2deb2SEric Biggers 483*6ff2deb2SEric Biggersfsverity_verify_page() returns false if verification failed; in this 484*6ff2deb2SEric Biggerscase, the filesystem must not set the page Uptodate. Following this, 485*6ff2deb2SEric Biggersas per the usual Linux pagecache behavior, attempts by userspace to 486*6ff2deb2SEric Biggersread() from the part of the file containing the page will fail with 487*6ff2deb2SEric BiggersEIO, and accesses to the page within a memory map will raise SIGBUS. 488*6ff2deb2SEric Biggers 489*6ff2deb2SEric Biggersfsverity_verify_page() currently only supports the case where the 490*6ff2deb2SEric BiggersMerkle tree block size is equal to PAGE_SIZE (often 4096 bytes). 491*6ff2deb2SEric Biggers 492*6ff2deb2SEric BiggersIn principle, fsverity_verify_page() verifies the entire path in the 493*6ff2deb2SEric BiggersMerkle tree from the data page to the root hash. However, for 494*6ff2deb2SEric Biggersefficiency the filesystem may cache the hash pages. Therefore, 495*6ff2deb2SEric Biggersfsverity_verify_page() only ascends the tree reading hash pages until 496*6ff2deb2SEric Biggersan already-verified hash page is seen, as indicated by the PageChecked 497*6ff2deb2SEric Biggersbit being set. It then verifies the path to that page. 498*6ff2deb2SEric Biggers 499*6ff2deb2SEric BiggersThis optimization, which is also used by dm-verity, results in 500*6ff2deb2SEric Biggersexcellent sequential read performance. This is because usually (e.g. 501*6ff2deb2SEric Biggers127 in 128 times for 4K blocks and SHA-256) the hash page from the 502*6ff2deb2SEric Biggersbottom level of the tree will already be cached and checked from 503*6ff2deb2SEric Biggersreading a previous data page. However, random reads perform worse. 504*6ff2deb2SEric Biggers 505*6ff2deb2SEric BiggersBlock device based filesystems 506*6ff2deb2SEric Biggers~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 507*6ff2deb2SEric Biggers 508*6ff2deb2SEric BiggersBlock device based filesystems (e.g. ext4 and f2fs) in Linux also use 509*6ff2deb2SEric Biggersthe pagecache, so the above subsection applies too. However, they 510*6ff2deb2SEric Biggersalso usually read many pages from a file at once, grouped into a 511*6ff2deb2SEric Biggersstructure called a "bio". To make it easier for these types of 512*6ff2deb2SEric Biggersfilesystems to support fs-verity, fs/verity/ also provides a function 513*6ff2deb2SEric Biggersfsverity_verify_bio() which verifies all pages in a bio. 514*6ff2deb2SEric Biggers 515*6ff2deb2SEric Biggersext4 and f2fs also support encryption. If a verity file is also 516*6ff2deb2SEric Biggersencrypted, the pages must be decrypted before being verified. To 517*6ff2deb2SEric Biggerssupport this, these filesystems allocate a "post-read context" for 518*6ff2deb2SEric Biggerseach bio and store it in ``->bi_private``:: 519*6ff2deb2SEric Biggers 520*6ff2deb2SEric Biggers struct bio_post_read_ctx { 521*6ff2deb2SEric Biggers struct bio *bio; 522*6ff2deb2SEric Biggers struct work_struct work; 523*6ff2deb2SEric Biggers unsigned int cur_step; 524*6ff2deb2SEric Biggers unsigned int enabled_steps; 525*6ff2deb2SEric Biggers }; 526*6ff2deb2SEric Biggers 527*6ff2deb2SEric Biggers``enabled_steps`` is a bitmask that specifies whether decryption, 528*6ff2deb2SEric Biggersverity, or both is enabled. After the bio completes, for each needed 529*6ff2deb2SEric Biggerspostprocessing step the filesystem enqueues the bio_post_read_ctx on a 530*6ff2deb2SEric Biggersworkqueue, and then the workqueue work does the decryption or 531*6ff2deb2SEric Biggersverification. Finally, pages where no decryption or verity error 532*6ff2deb2SEric Biggersoccurred are marked Uptodate, and the pages are unlocked. 533*6ff2deb2SEric Biggers 534*6ff2deb2SEric BiggersFiles on ext4 and f2fs may contain holes. Normally, ``->readpages()`` 535*6ff2deb2SEric Biggerssimply zeroes holes and sets the corresponding pages Uptodate; no bios 536*6ff2deb2SEric Biggersare issued. To prevent this case from bypassing fs-verity, these 537*6ff2deb2SEric Biggersfilesystems use fsverity_verify_page() to verify hole pages. 538*6ff2deb2SEric Biggers 539*6ff2deb2SEric Biggersext4 and f2fs disable direct I/O on verity files, since otherwise 540*6ff2deb2SEric Biggersdirect I/O would bypass fs-verity. (They also do the same for 541*6ff2deb2SEric Biggersencrypted files.) 542*6ff2deb2SEric Biggers 543*6ff2deb2SEric BiggersUserspace utility 544*6ff2deb2SEric Biggers================= 545*6ff2deb2SEric Biggers 546*6ff2deb2SEric BiggersThis document focuses on the kernel, but a userspace utility for 547*6ff2deb2SEric Biggersfs-verity can be found at: 548*6ff2deb2SEric Biggers 549*6ff2deb2SEric Biggers https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git 550*6ff2deb2SEric Biggers 551*6ff2deb2SEric BiggersSee the README.md file in the fsverity-utils source tree for details, 552*6ff2deb2SEric Biggersincluding examples of setting up fs-verity protected files. 553*6ff2deb2SEric Biggers 554*6ff2deb2SEric BiggersTests 555*6ff2deb2SEric Biggers===== 556*6ff2deb2SEric Biggers 557*6ff2deb2SEric BiggersTo test fs-verity, use xfstests. For example, using `kvm-xfstests 558*6ff2deb2SEric Biggers<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_:: 559*6ff2deb2SEric Biggers 560*6ff2deb2SEric Biggers kvm-xfstests -c ext4,f2fs -g verity 561*6ff2deb2SEric Biggers 562*6ff2deb2SEric BiggersFAQ 563*6ff2deb2SEric Biggers=== 564*6ff2deb2SEric Biggers 565*6ff2deb2SEric BiggersThis section answers frequently asked questions about fs-verity that 566*6ff2deb2SEric Biggersweren't already directly answered in other parts of this document. 567*6ff2deb2SEric Biggers 568*6ff2deb2SEric Biggers:Q: Why isn't fs-verity part of IMA? 569*6ff2deb2SEric Biggers:A: fs-verity and IMA (Integrity Measurement Architecture) have 570*6ff2deb2SEric Biggers different focuses. fs-verity is a filesystem-level mechanism for 571*6ff2deb2SEric Biggers hashing individual files using a Merkle tree. In contrast, IMA 572*6ff2deb2SEric Biggers specifies a system-wide policy that specifies which files are 573*6ff2deb2SEric Biggers hashed and what to do with those hashes, such as log them, 574*6ff2deb2SEric Biggers authenticate them, or add them to a measurement list. 575*6ff2deb2SEric Biggers 576*6ff2deb2SEric Biggers IMA is planned to support the fs-verity hashing mechanism as an 577*6ff2deb2SEric Biggers alternative to doing full file hashes, for people who want the 578*6ff2deb2SEric Biggers performance and security benefits of the Merkle tree based hash. 579*6ff2deb2SEric Biggers But it doesn't make sense to force all uses of fs-verity to be 580*6ff2deb2SEric Biggers through IMA. As a standalone filesystem feature, fs-verity 581*6ff2deb2SEric Biggers already meets many users' needs, and it's testable like other 582*6ff2deb2SEric Biggers filesystem features e.g. with xfstests. 583*6ff2deb2SEric Biggers 584*6ff2deb2SEric Biggers:Q: Isn't fs-verity useless because the attacker can just modify the 585*6ff2deb2SEric Biggers hashes in the Merkle tree, which is stored on-disk? 586*6ff2deb2SEric Biggers:A: To verify the authenticity of an fs-verity file you must verify 587*6ff2deb2SEric Biggers the authenticity of the "file measurement", which is basically the 588*6ff2deb2SEric Biggers root hash of the Merkle tree. See `Use cases`_. 589*6ff2deb2SEric Biggers 590*6ff2deb2SEric Biggers:Q: Isn't fs-verity useless because the attacker can just replace a 591*6ff2deb2SEric Biggers verity file with a non-verity one? 592*6ff2deb2SEric Biggers:A: See `Use cases`_. In the initial use case, it's really trusted 593*6ff2deb2SEric Biggers userspace code that authenticates the files; fs-verity is just a 594*6ff2deb2SEric Biggers tool to do this job efficiently and securely. The trusted 595*6ff2deb2SEric Biggers userspace code will consider non-verity files to be inauthentic. 596*6ff2deb2SEric Biggers 597*6ff2deb2SEric Biggers:Q: Why does the Merkle tree need to be stored on-disk? Couldn't you 598*6ff2deb2SEric Biggers store just the root hash? 599*6ff2deb2SEric Biggers:A: If the Merkle tree wasn't stored on-disk, then you'd have to 600*6ff2deb2SEric Biggers compute the entire tree when the file is first accessed, even if 601*6ff2deb2SEric Biggers just one byte is being read. This is a fundamental consequence of 602*6ff2deb2SEric Biggers how Merkle tree hashing works. To verify a leaf node, you need to 603*6ff2deb2SEric Biggers verify the whole path to the root hash, including the root node 604*6ff2deb2SEric Biggers (the thing which the root hash is a hash of). But if the root 605*6ff2deb2SEric Biggers node isn't stored on-disk, you have to compute it by hashing its 606*6ff2deb2SEric Biggers children, and so on until you've actually hashed the entire file. 607*6ff2deb2SEric Biggers 608*6ff2deb2SEric Biggers That defeats most of the point of doing a Merkle tree-based hash, 609*6ff2deb2SEric Biggers since if you have to hash the whole file ahead of time anyway, 610*6ff2deb2SEric Biggers then you could simply do sha256(file) instead. That would be much 611*6ff2deb2SEric Biggers simpler, and a bit faster too. 612*6ff2deb2SEric Biggers 613*6ff2deb2SEric Biggers It's true that an in-memory Merkle tree could still provide the 614*6ff2deb2SEric Biggers advantage of verification on every read rather than just on the 615*6ff2deb2SEric Biggers first read. However, it would be inefficient because every time a 616*6ff2deb2SEric Biggers hash page gets evicted (you can't pin the entire Merkle tree into 617*6ff2deb2SEric Biggers memory, since it may be very large), in order to restore it you 618*6ff2deb2SEric Biggers again need to hash everything below it in the tree. This again 619*6ff2deb2SEric Biggers defeats most of the point of doing a Merkle tree-based hash, since 620*6ff2deb2SEric Biggers a single block read could trigger re-hashing gigabytes of data. 621*6ff2deb2SEric Biggers 622*6ff2deb2SEric Biggers:Q: But couldn't you store just the leaf nodes and compute the rest? 623*6ff2deb2SEric Biggers:A: See previous answer; this really just moves up one level, since 624*6ff2deb2SEric Biggers one could alternatively interpret the data blocks as being the 625*6ff2deb2SEric Biggers leaf nodes of the Merkle tree. It's true that the tree can be 626*6ff2deb2SEric Biggers computed much faster if the leaf level is stored rather than just 627*6ff2deb2SEric Biggers the data, but that's only because each level is less than 1% the 628*6ff2deb2SEric Biggers size of the level below (assuming the recommended settings of 629*6ff2deb2SEric Biggers SHA-256 and 4K blocks). For the exact same reason, by storing 630*6ff2deb2SEric Biggers "just the leaf nodes" you'd already be storing over 99% of the 631*6ff2deb2SEric Biggers tree, so you might as well simply store the whole tree. 632*6ff2deb2SEric Biggers 633*6ff2deb2SEric Biggers:Q: Can the Merkle tree be built ahead of time, e.g. distributed as 634*6ff2deb2SEric Biggers part of a package that is installed to many computers? 635*6ff2deb2SEric Biggers:A: This isn't currently supported. It was part of the original 636*6ff2deb2SEric Biggers design, but was removed to simplify the kernel UAPI and because it 637*6ff2deb2SEric Biggers wasn't a critical use case. Files are usually installed once and 638*6ff2deb2SEric Biggers used many times, and cryptographic hashing is somewhat fast on 639*6ff2deb2SEric Biggers most modern processors. 640*6ff2deb2SEric Biggers 641*6ff2deb2SEric Biggers:Q: Why doesn't fs-verity support writes? 642*6ff2deb2SEric Biggers:A: Write support would be very difficult and would require a 643*6ff2deb2SEric Biggers completely different design, so it's well outside the scope of 644*6ff2deb2SEric Biggers fs-verity. Write support would require: 645*6ff2deb2SEric Biggers 646*6ff2deb2SEric Biggers - A way to maintain consistency between the data and hashes, 647*6ff2deb2SEric Biggers including all levels of hashes, since corruption after a crash 648*6ff2deb2SEric Biggers (especially of potentially the entire file!) is unacceptable. 649*6ff2deb2SEric Biggers The main options for solving this are data journalling, 650*6ff2deb2SEric Biggers copy-on-write, and log-structured volume. But it's very hard to 651*6ff2deb2SEric Biggers retrofit existing filesystems with new consistency mechanisms. 652*6ff2deb2SEric Biggers Data journalling is available on ext4, but is very slow. 653*6ff2deb2SEric Biggers 654*6ff2deb2SEric Biggers - Rebuilding the the Merkle tree after every write, which would be 655*6ff2deb2SEric Biggers extremely inefficient. Alternatively, a different authenticated 656*6ff2deb2SEric Biggers dictionary structure such as an "authenticated skiplist" could 657*6ff2deb2SEric Biggers be used. However, this would be far more complex. 658*6ff2deb2SEric Biggers 659*6ff2deb2SEric Biggers Compare it to dm-verity vs. dm-integrity. dm-verity is very 660*6ff2deb2SEric Biggers simple: the kernel just verifies read-only data against a 661*6ff2deb2SEric Biggers read-only Merkle tree. In contrast, dm-integrity supports writes 662*6ff2deb2SEric Biggers but is slow, is much more complex, and doesn't actually support 663*6ff2deb2SEric Biggers full-device authentication since it authenticates each sector 664*6ff2deb2SEric Biggers independently, i.e. there is no "root hash". It doesn't really 665*6ff2deb2SEric Biggers make sense for the same device-mapper target to support these two 666*6ff2deb2SEric Biggers very different cases; the same applies to fs-verity. 667*6ff2deb2SEric Biggers 668*6ff2deb2SEric Biggers:Q: Since verity files are immutable, why isn't the immutable bit set? 669*6ff2deb2SEric Biggers:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a 670*6ff2deb2SEric Biggers specific set of semantics which not only make the file contents 671*6ff2deb2SEric Biggers read-only, but also prevent the file from being deleted, renamed, 672*6ff2deb2SEric Biggers linked to, or having its owner or mode changed. These extra 673*6ff2deb2SEric Biggers properties are unwanted for fs-verity, so reusing the immutable 674*6ff2deb2SEric Biggers bit isn't appropriate. 675*6ff2deb2SEric Biggers 676*6ff2deb2SEric Biggers:Q: Why does the API use ioctls instead of setxattr() and getxattr()? 677*6ff2deb2SEric Biggers:A: Abusing the xattr interface for basically arbitrary syscalls is 678*6ff2deb2SEric Biggers heavily frowned upon by most of the Linux filesystem developers. 679*6ff2deb2SEric Biggers An xattr should really just be an xattr on-disk, not an API to 680*6ff2deb2SEric Biggers e.g. magically trigger construction of a Merkle tree. 681*6ff2deb2SEric Biggers 682*6ff2deb2SEric Biggers:Q: Does fs-verity support remote filesystems? 683*6ff2deb2SEric Biggers:A: Only ext4 and f2fs support is implemented currently, but in 684*6ff2deb2SEric Biggers principle any filesystem that can store per-file verity metadata 685*6ff2deb2SEric Biggers can support fs-verity, regardless of whether it's local or remote. 686*6ff2deb2SEric Biggers Some filesystems may have fewer options of where to store the 687*6ff2deb2SEric Biggers verity metadata; one possibility is to store it past the end of 688*6ff2deb2SEric Biggers the file and "hide" it from userspace by manipulating i_size. The 689*6ff2deb2SEric Biggers data verification functions provided by ``fs/verity/`` also assume 690*6ff2deb2SEric Biggers that the filesystem uses the Linux pagecache, but both local and 691*6ff2deb2SEric Biggers remote filesystems normally do so. 692*6ff2deb2SEric Biggers 693*6ff2deb2SEric Biggers:Q: Why is anything filesystem-specific at all? Shouldn't fs-verity 694*6ff2deb2SEric Biggers be implemented entirely at the VFS level? 695*6ff2deb2SEric Biggers:A: There are many reasons why this is not possible or would be very 696*6ff2deb2SEric Biggers difficult, including the following: 697*6ff2deb2SEric Biggers 698*6ff2deb2SEric Biggers - To prevent bypassing verification, pages must not be marked 699*6ff2deb2SEric Biggers Uptodate until they've been verified. Currently, each 700*6ff2deb2SEric Biggers filesystem is responsible for marking pages Uptodate via 701*6ff2deb2SEric Biggers ``->readpages()``. Therefore, currently it's not possible for 702*6ff2deb2SEric Biggers the VFS to do the verification on its own. Changing this would 703*6ff2deb2SEric Biggers require significant changes to the VFS and all filesystems. 704*6ff2deb2SEric Biggers 705*6ff2deb2SEric Biggers - It would require defining a filesystem-independent way to store 706*6ff2deb2SEric Biggers the verity metadata. Extended attributes don't work for this 707*6ff2deb2SEric Biggers because (a) the Merkle tree may be gigabytes, but many 708*6ff2deb2SEric Biggers filesystems assume that all xattrs fit into a single 4K 709*6ff2deb2SEric Biggers filesystem block, and (b) ext4 and f2fs encryption doesn't 710*6ff2deb2SEric Biggers encrypt xattrs, yet the Merkle tree *must* be encrypted when the 711*6ff2deb2SEric Biggers file contents are, because it stores hashes of the plaintext 712*6ff2deb2SEric Biggers file contents. 713*6ff2deb2SEric Biggers 714*6ff2deb2SEric Biggers So the verity metadata would have to be stored in an actual 715*6ff2deb2SEric Biggers file. Using a separate file would be very ugly, since the 716*6ff2deb2SEric Biggers metadata is fundamentally part of the file to be protected, and 717*6ff2deb2SEric Biggers it could cause problems where users could delete the real file 718*6ff2deb2SEric Biggers but not the metadata file or vice versa. On the other hand, 719*6ff2deb2SEric Biggers having it be in the same file would break applications unless 720*6ff2deb2SEric Biggers filesystems' notion of i_size were divorced from the VFS's, 721*6ff2deb2SEric Biggers which would be complex and require changes to all filesystems. 722*6ff2deb2SEric Biggers 723*6ff2deb2SEric Biggers - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's 724*6ff2deb2SEric Biggers transaction mechanism so that either the file ends up with 725*6ff2deb2SEric Biggers verity enabled, or no changes were made. Allowing intermediate 726*6ff2deb2SEric Biggers states to occur after a crash may cause problems. 727