xref: /openbmc/linux/Documentation/filesystems/fsverity.rst (revision 6ff2deb2e8f6f2e23959724bab03ac60a050710b)
1*6ff2deb2SEric Biggers.. SPDX-License-Identifier: GPL-2.0
2*6ff2deb2SEric Biggers
3*6ff2deb2SEric Biggers.. _fsverity:
4*6ff2deb2SEric Biggers
5*6ff2deb2SEric Biggers=======================================================
6*6ff2deb2SEric Biggersfs-verity: read-only file-based authenticity protection
7*6ff2deb2SEric Biggers=======================================================
8*6ff2deb2SEric Biggers
9*6ff2deb2SEric BiggersIntroduction
10*6ff2deb2SEric Biggers============
11*6ff2deb2SEric Biggers
12*6ff2deb2SEric Biggersfs-verity (``fs/verity/``) is a support layer that filesystems can
13*6ff2deb2SEric Biggershook into to support transparent integrity and authenticity protection
14*6ff2deb2SEric Biggersof read-only files.  Currently, it is supported by the ext4 and f2fs
15*6ff2deb2SEric Biggersfilesystems.  Like fscrypt, not too much filesystem-specific code is
16*6ff2deb2SEric Biggersneeded to support fs-verity.
17*6ff2deb2SEric Biggers
18*6ff2deb2SEric Biggersfs-verity is similar to `dm-verity
19*6ff2deb2SEric Biggers<https://www.kernel.org/doc/Documentation/device-mapper/verity.txt>`_
20*6ff2deb2SEric Biggersbut works on files rather than block devices.  On regular files on
21*6ff2deb2SEric Biggersfilesystems supporting fs-verity, userspace can execute an ioctl that
22*6ff2deb2SEric Biggerscauses the filesystem to build a Merkle tree for the file and persist
23*6ff2deb2SEric Biggersit to a filesystem-specific location associated with the file.
24*6ff2deb2SEric Biggers
25*6ff2deb2SEric BiggersAfter this, the file is made readonly, and all reads from the file are
26*6ff2deb2SEric Biggersautomatically verified against the file's Merkle tree.  Reads of any
27*6ff2deb2SEric Biggerscorrupted data, including mmap reads, will fail.
28*6ff2deb2SEric Biggers
29*6ff2deb2SEric BiggersUserspace can use another ioctl to retrieve the root hash (actually
30*6ff2deb2SEric Biggersthe "file measurement", which is a hash that includes the root hash)
31*6ff2deb2SEric Biggersthat fs-verity is enforcing for the file.  This ioctl executes in
32*6ff2deb2SEric Biggersconstant time, regardless of the file size.
33*6ff2deb2SEric Biggers
34*6ff2deb2SEric Biggersfs-verity is essentially a way to hash a file in constant time,
35*6ff2deb2SEric Biggerssubject to the caveat that reads which would violate the hash will
36*6ff2deb2SEric Biggersfail at runtime.
37*6ff2deb2SEric Biggers
38*6ff2deb2SEric BiggersUse cases
39*6ff2deb2SEric Biggers=========
40*6ff2deb2SEric Biggers
41*6ff2deb2SEric BiggersBy itself, the base fs-verity feature only provides integrity
42*6ff2deb2SEric Biggersprotection, i.e. detection of accidental (non-malicious) corruption.
43*6ff2deb2SEric Biggers
44*6ff2deb2SEric BiggersHowever, because fs-verity makes retrieving the file hash extremely
45*6ff2deb2SEric Biggersefficient, it's primarily meant to be used as a tool to support
46*6ff2deb2SEric Biggersauthentication (detection of malicious modifications) or auditing
47*6ff2deb2SEric Biggers(logging file hashes before use).
48*6ff2deb2SEric Biggers
49*6ff2deb2SEric BiggersTrusted userspace code (e.g. operating system code running on a
50*6ff2deb2SEric Biggersread-only partition that is itself authenticated by dm-verity) can
51*6ff2deb2SEric Biggersauthenticate the contents of an fs-verity file by using the
52*6ff2deb2SEric Biggers`FS_IOC_MEASURE_VERITY`_ ioctl to retrieve its hash, then verifying a
53*6ff2deb2SEric Biggersdigital signature of it.
54*6ff2deb2SEric Biggers
55*6ff2deb2SEric BiggersA standard file hash could be used instead of fs-verity.  However,
56*6ff2deb2SEric Biggersthis is inefficient if the file is large and only a small portion may
57*6ff2deb2SEric Biggersbe accessed.  This is often the case for Android application package
58*6ff2deb2SEric Biggers(APK) files, for example.  These typically contain many translations,
59*6ff2deb2SEric Biggersclasses, and other resources that are infrequently or even never
60*6ff2deb2SEric Biggersaccessed on a particular device.  It would be slow and wasteful to
61*6ff2deb2SEric Biggersread and hash the entire file before starting the application.
62*6ff2deb2SEric Biggers
63*6ff2deb2SEric BiggersUnlike an ahead-of-time hash, fs-verity also re-verifies data each
64*6ff2deb2SEric Biggerstime it's paged in.  This ensures that malicious disk firmware can't
65*6ff2deb2SEric Biggersundetectably change the contents of the file at runtime.
66*6ff2deb2SEric Biggers
67*6ff2deb2SEric Biggersfs-verity does not replace or obsolete dm-verity.  dm-verity should
68*6ff2deb2SEric Biggersstill be used on read-only filesystems.  fs-verity is for files that
69*6ff2deb2SEric Biggersmust live on a read-write filesystem because they are independently
70*6ff2deb2SEric Biggersupdated and potentially user-installed, so dm-verity cannot be used.
71*6ff2deb2SEric Biggers
72*6ff2deb2SEric BiggersThe base fs-verity feature is a hashing mechanism only; actually
73*6ff2deb2SEric Biggersauthenticating the files is up to userspace.  However, to meet some
74*6ff2deb2SEric Biggersusers' needs, fs-verity optionally supports a simple signature
75*6ff2deb2SEric Biggersverification mechanism where users can configure the kernel to require
76*6ff2deb2SEric Biggersthat all fs-verity files be signed by a key loaded into a keyring; see
77*6ff2deb2SEric Biggers`Built-in signature verification`_.  Support for fs-verity file hashes
78*6ff2deb2SEric Biggersin IMA (Integrity Measurement Architecture) policies is also planned.
79*6ff2deb2SEric Biggers
80*6ff2deb2SEric BiggersUser API
81*6ff2deb2SEric Biggers========
82*6ff2deb2SEric Biggers
83*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY
84*6ff2deb2SEric Biggers--------------------
85*6ff2deb2SEric Biggers
86*6ff2deb2SEric BiggersThe FS_IOC_ENABLE_VERITY ioctl enables fs-verity on a file.  It takes
87*6ff2deb2SEric Biggersin a pointer to a :c:type:`struct fsverity_enable_arg`, defined as
88*6ff2deb2SEric Biggersfollows::
89*6ff2deb2SEric Biggers
90*6ff2deb2SEric Biggers    struct fsverity_enable_arg {
91*6ff2deb2SEric Biggers            __u32 version;
92*6ff2deb2SEric Biggers            __u32 hash_algorithm;
93*6ff2deb2SEric Biggers            __u32 block_size;
94*6ff2deb2SEric Biggers            __u32 salt_size;
95*6ff2deb2SEric Biggers            __u64 salt_ptr;
96*6ff2deb2SEric Biggers            __u32 sig_size;
97*6ff2deb2SEric Biggers            __u32 __reserved1;
98*6ff2deb2SEric Biggers            __u64 sig_ptr;
99*6ff2deb2SEric Biggers            __u64 __reserved2[11];
100*6ff2deb2SEric Biggers    };
101*6ff2deb2SEric Biggers
102*6ff2deb2SEric BiggersThis structure contains the parameters of the Merkle tree to build for
103*6ff2deb2SEric Biggersthe file, and optionally contains a signature.  It must be initialized
104*6ff2deb2SEric Biggersas follows:
105*6ff2deb2SEric Biggers
106*6ff2deb2SEric Biggers- ``version`` must be 1.
107*6ff2deb2SEric Biggers- ``hash_algorithm`` must be the identifier for the hash algorithm to
108*6ff2deb2SEric Biggers  use for the Merkle tree, such as FS_VERITY_HASH_ALG_SHA256.  See
109*6ff2deb2SEric Biggers  ``include/uapi/linux/fsverity.h`` for the list of possible values.
110*6ff2deb2SEric Biggers- ``block_size`` must be the Merkle tree block size.  Currently, this
111*6ff2deb2SEric Biggers  must be equal to the system page size, which is usually 4096 bytes.
112*6ff2deb2SEric Biggers  Other sizes may be supported in the future.  This value is not
113*6ff2deb2SEric Biggers  necessarily the same as the filesystem block size.
114*6ff2deb2SEric Biggers- ``salt_size`` is the size of the salt in bytes, or 0 if no salt is
115*6ff2deb2SEric Biggers  provided.  The salt is a value that is prepended to every hashed
116*6ff2deb2SEric Biggers  block; it can be used to personalize the hashing for a particular
117*6ff2deb2SEric Biggers  file or device.  Currently the maximum salt size is 32 bytes.
118*6ff2deb2SEric Biggers- ``salt_ptr`` is the pointer to the salt, or NULL if no salt is
119*6ff2deb2SEric Biggers  provided.
120*6ff2deb2SEric Biggers- ``sig_size`` is the size of the signature in bytes, or 0 if no
121*6ff2deb2SEric Biggers  signature is provided.  Currently the signature is (somewhat
122*6ff2deb2SEric Biggers  arbitrarily) limited to 16128 bytes.  See `Built-in signature
123*6ff2deb2SEric Biggers  verification`_ for more information.
124*6ff2deb2SEric Biggers- ``sig_ptr``  is the pointer to the signature, or NULL if no
125*6ff2deb2SEric Biggers  signature is provided.
126*6ff2deb2SEric Biggers- All reserved fields must be zeroed.
127*6ff2deb2SEric Biggers
128*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY causes the filesystem to build a Merkle tree for
129*6ff2deb2SEric Biggersthe file and persist it to a filesystem-specific location associated
130*6ff2deb2SEric Biggerswith the file, then mark the file as a verity file.  This ioctl may
131*6ff2deb2SEric Biggerstake a long time to execute on large files, and it is interruptible by
132*6ff2deb2SEric Biggersfatal signals.
133*6ff2deb2SEric Biggers
134*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY checks for write access to the inode.  However,
135*6ff2deb2SEric Biggersit must be executed on an O_RDONLY file descriptor and no processes
136*6ff2deb2SEric Biggerscan have the file open for writing.  Attempts to open the file for
137*6ff2deb2SEric Biggerswriting while this ioctl is executing will fail with ETXTBSY.  (This
138*6ff2deb2SEric Biggersis necessary to guarantee that no writable file descriptors will exist
139*6ff2deb2SEric Biggersafter verity is enabled, and to guarantee that the file's contents are
140*6ff2deb2SEric Biggersstable while the Merkle tree is being built over it.)
141*6ff2deb2SEric Biggers
142*6ff2deb2SEric BiggersOn success, FS_IOC_ENABLE_VERITY returns 0, and the file becomes a
143*6ff2deb2SEric Biggersverity file.  On failure (including the case of interruption by a
144*6ff2deb2SEric Biggersfatal signal), no changes are made to the file.
145*6ff2deb2SEric Biggers
146*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY can fail with the following errors:
147*6ff2deb2SEric Biggers
148*6ff2deb2SEric Biggers- ``EACCES``: the process does not have write access to the file
149*6ff2deb2SEric Biggers- ``EBADMSG``: the signature is malformed
150*6ff2deb2SEric Biggers- ``EBUSY``: this ioctl is already running on the file
151*6ff2deb2SEric Biggers- ``EEXIST``: the file already has verity enabled
152*6ff2deb2SEric Biggers- ``EFAULT``: the caller provided inaccessible memory
153*6ff2deb2SEric Biggers- ``EINTR``: the operation was interrupted by a fatal signal
154*6ff2deb2SEric Biggers- ``EINVAL``: unsupported version, hash algorithm, or block size; or
155*6ff2deb2SEric Biggers  reserved bits are set; or the file descriptor refers to neither a
156*6ff2deb2SEric Biggers  regular file nor a directory.
157*6ff2deb2SEric Biggers- ``EISDIR``: the file descriptor refers to a directory
158*6ff2deb2SEric Biggers- ``EKEYREJECTED``: the signature doesn't match the file
159*6ff2deb2SEric Biggers- ``EMSGSIZE``: the salt or signature is too long
160*6ff2deb2SEric Biggers- ``ENOKEY``: the fs-verity keyring doesn't contain the certificate
161*6ff2deb2SEric Biggers  needed to verify the signature
162*6ff2deb2SEric Biggers- ``ENOPKG``: fs-verity recognizes the hash algorithm, but it's not
163*6ff2deb2SEric Biggers  available in the kernel's crypto API as currently configured (e.g.
164*6ff2deb2SEric Biggers  for SHA-512, missing CONFIG_CRYPTO_SHA512).
165*6ff2deb2SEric Biggers- ``ENOTTY``: this type of filesystem does not implement fs-verity
166*6ff2deb2SEric Biggers- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
167*6ff2deb2SEric Biggers  support; or the filesystem superblock has not had the 'verity'
168*6ff2deb2SEric Biggers  feature enabled on it; or the filesystem does not support fs-verity
169*6ff2deb2SEric Biggers  on this file.  (See `Filesystem support`_.)
170*6ff2deb2SEric Biggers- ``EPERM``: the file is append-only; or, a signature is required and
171*6ff2deb2SEric Biggers  one was not provided.
172*6ff2deb2SEric Biggers- ``EROFS``: the filesystem is read-only
173*6ff2deb2SEric Biggers- ``ETXTBSY``: someone has the file open for writing.  This can be the
174*6ff2deb2SEric Biggers  caller's file descriptor, another open file descriptor, or the file
175*6ff2deb2SEric Biggers  reference held by a writable memory map.
176*6ff2deb2SEric Biggers
177*6ff2deb2SEric BiggersFS_IOC_MEASURE_VERITY
178*6ff2deb2SEric Biggers---------------------
179*6ff2deb2SEric Biggers
180*6ff2deb2SEric BiggersThe FS_IOC_MEASURE_VERITY ioctl retrieves the measurement of a verity
181*6ff2deb2SEric Biggersfile.  The file measurement is a digest that cryptographically
182*6ff2deb2SEric Biggersidentifies the file contents that are being enforced on reads.
183*6ff2deb2SEric Biggers
184*6ff2deb2SEric BiggersThis ioctl takes in a pointer to a variable-length structure::
185*6ff2deb2SEric Biggers
186*6ff2deb2SEric Biggers    struct fsverity_digest {
187*6ff2deb2SEric Biggers            __u16 digest_algorithm;
188*6ff2deb2SEric Biggers            __u16 digest_size; /* input/output */
189*6ff2deb2SEric Biggers            __u8 digest[];
190*6ff2deb2SEric Biggers    };
191*6ff2deb2SEric Biggers
192*6ff2deb2SEric Biggers``digest_size`` is an input/output field.  On input, it must be
193*6ff2deb2SEric Biggersinitialized to the number of bytes allocated for the variable-length
194*6ff2deb2SEric Biggers``digest`` field.
195*6ff2deb2SEric Biggers
196*6ff2deb2SEric BiggersOn success, 0 is returned and the kernel fills in the structure as
197*6ff2deb2SEric Biggersfollows:
198*6ff2deb2SEric Biggers
199*6ff2deb2SEric Biggers- ``digest_algorithm`` will be the hash algorithm used for the file
200*6ff2deb2SEric Biggers  measurement.  It will match ``fsverity_enable_arg::hash_algorithm``.
201*6ff2deb2SEric Biggers- ``digest_size`` will be the size of the digest in bytes, e.g. 32
202*6ff2deb2SEric Biggers  for SHA-256.  (This can be redundant with ``digest_algorithm``.)
203*6ff2deb2SEric Biggers- ``digest`` will be the actual bytes of the digest.
204*6ff2deb2SEric Biggers
205*6ff2deb2SEric BiggersFS_IOC_MEASURE_VERITY is guaranteed to execute in constant time,
206*6ff2deb2SEric Biggersregardless of the size of the file.
207*6ff2deb2SEric Biggers
208*6ff2deb2SEric BiggersFS_IOC_MEASURE_VERITY can fail with the following errors:
209*6ff2deb2SEric Biggers
210*6ff2deb2SEric Biggers- ``EFAULT``: the caller provided inaccessible memory
211*6ff2deb2SEric Biggers- ``ENODATA``: the file is not a verity file
212*6ff2deb2SEric Biggers- ``ENOTTY``: this type of filesystem does not implement fs-verity
213*6ff2deb2SEric Biggers- ``EOPNOTSUPP``: the kernel was not configured with fs-verity
214*6ff2deb2SEric Biggers  support, or the filesystem superblock has not had the 'verity'
215*6ff2deb2SEric Biggers  feature enabled on it.  (See `Filesystem support`_.)
216*6ff2deb2SEric Biggers- ``EOVERFLOW``: the digest is longer than the specified
217*6ff2deb2SEric Biggers  ``digest_size`` bytes.  Try providing a larger buffer.
218*6ff2deb2SEric Biggers
219*6ff2deb2SEric BiggersFS_IOC_GETFLAGS
220*6ff2deb2SEric Biggers---------------
221*6ff2deb2SEric Biggers
222*6ff2deb2SEric BiggersThe existing ioctl FS_IOC_GETFLAGS (which isn't specific to fs-verity)
223*6ff2deb2SEric Biggerscan also be used to check whether a file has fs-verity enabled or not.
224*6ff2deb2SEric BiggersTo do so, check for FS_VERITY_FL (0x00100000) in the returned flags.
225*6ff2deb2SEric Biggers
226*6ff2deb2SEric BiggersThe verity flag is not settable via FS_IOC_SETFLAGS.  You must use
227*6ff2deb2SEric BiggersFS_IOC_ENABLE_VERITY instead, since parameters must be provided.
228*6ff2deb2SEric Biggers
229*6ff2deb2SEric BiggersAccessing verity files
230*6ff2deb2SEric Biggers======================
231*6ff2deb2SEric Biggers
232*6ff2deb2SEric BiggersApplications can transparently access a verity file just like a
233*6ff2deb2SEric Biggersnon-verity one, with the following exceptions:
234*6ff2deb2SEric Biggers
235*6ff2deb2SEric Biggers- Verity files are readonly.  They cannot be opened for writing or
236*6ff2deb2SEric Biggers  truncate()d, even if the file mode bits allow it.  Attempts to do
237*6ff2deb2SEric Biggers  one of these things will fail with EPERM.  However, changes to
238*6ff2deb2SEric Biggers  metadata such as owner, mode, timestamps, and xattrs are still
239*6ff2deb2SEric Biggers  allowed, since these are not measured by fs-verity.  Verity files
240*6ff2deb2SEric Biggers  can also still be renamed, deleted, and linked to.
241*6ff2deb2SEric Biggers
242*6ff2deb2SEric Biggers- Direct I/O is not supported on verity files.  Attempts to use direct
243*6ff2deb2SEric Biggers  I/O on such files will fall back to buffered I/O.
244*6ff2deb2SEric Biggers
245*6ff2deb2SEric Biggers- DAX (Direct Access) is not supported on verity files, because this
246*6ff2deb2SEric Biggers  would circumvent the data verification.
247*6ff2deb2SEric Biggers
248*6ff2deb2SEric Biggers- Reads of data that doesn't match the verity Merkle tree will fail
249*6ff2deb2SEric Biggers  with EIO (for read()) or SIGBUS (for mmap() reads).
250*6ff2deb2SEric Biggers
251*6ff2deb2SEric Biggers- If the sysctl "fs.verity.require_signatures" is set to 1 and the
252*6ff2deb2SEric Biggers  file's verity measurement is not signed by a key in the fs-verity
253*6ff2deb2SEric Biggers  keyring, then opening the file will fail.  See `Built-in signature
254*6ff2deb2SEric Biggers  verification`_.
255*6ff2deb2SEric Biggers
256*6ff2deb2SEric BiggersDirect access to the Merkle tree is not supported.  Therefore, if a
257*6ff2deb2SEric Biggersverity file is copied, or is backed up and restored, then it will lose
258*6ff2deb2SEric Biggersits "verity"-ness.  fs-verity is primarily meant for files like
259*6ff2deb2SEric Biggersexecutables that are managed by a package manager.
260*6ff2deb2SEric Biggers
261*6ff2deb2SEric BiggersFile measurement computation
262*6ff2deb2SEric Biggers============================
263*6ff2deb2SEric Biggers
264*6ff2deb2SEric BiggersThis section describes how fs-verity hashes the file contents using a
265*6ff2deb2SEric BiggersMerkle tree to produce the "file measurement" which cryptographically
266*6ff2deb2SEric Biggersidentifies the file contents.  This algorithm is the same for all
267*6ff2deb2SEric Biggersfilesystems that support fs-verity.
268*6ff2deb2SEric Biggers
269*6ff2deb2SEric BiggersUserspace only needs to be aware of this algorithm if it needs to
270*6ff2deb2SEric Biggerscompute the file measurement itself, e.g. in order to sign the file.
271*6ff2deb2SEric Biggers
272*6ff2deb2SEric Biggers.. _fsverity_merkle_tree:
273*6ff2deb2SEric Biggers
274*6ff2deb2SEric BiggersMerkle tree
275*6ff2deb2SEric Biggers-----------
276*6ff2deb2SEric Biggers
277*6ff2deb2SEric BiggersThe file contents is divided into blocks, where the block size is
278*6ff2deb2SEric Biggersconfigurable but is usually 4096 bytes.  The end of the last block is
279*6ff2deb2SEric Biggerszero-padded if needed.  Each block is then hashed, producing the first
280*6ff2deb2SEric Biggerslevel of hashes.  Then, the hashes in this first level are grouped
281*6ff2deb2SEric Biggersinto 'blocksize'-byte blocks (zero-padding the ends as needed) and
282*6ff2deb2SEric Biggersthese blocks are hashed, producing the second level of hashes.  This
283*6ff2deb2SEric Biggersproceeds up the tree until only a single block remains.  The hash of
284*6ff2deb2SEric Biggersthis block is the "Merkle tree root hash".
285*6ff2deb2SEric Biggers
286*6ff2deb2SEric BiggersIf the file fits in one block and is nonempty, then the "Merkle tree
287*6ff2deb2SEric Biggersroot hash" is simply the hash of the single data block.  If the file
288*6ff2deb2SEric Biggersis empty, then the "Merkle tree root hash" is all zeroes.
289*6ff2deb2SEric Biggers
290*6ff2deb2SEric BiggersThe "blocks" here are not necessarily the same as "filesystem blocks".
291*6ff2deb2SEric Biggers
292*6ff2deb2SEric BiggersIf a salt was specified, then it's zero-padded to the closest multiple
293*6ff2deb2SEric Biggersof the input size of the hash algorithm's compression function, e.g.
294*6ff2deb2SEric Biggers64 bytes for SHA-256 or 128 bytes for SHA-512.  The padded salt is
295*6ff2deb2SEric Biggersprepended to every data or Merkle tree block that is hashed.
296*6ff2deb2SEric Biggers
297*6ff2deb2SEric BiggersThe purpose of the block padding is to cause every hash to be taken
298*6ff2deb2SEric Biggersover the same amount of data, which simplifies the implementation and
299*6ff2deb2SEric Biggerskeeps open more possibilities for hardware acceleration.  The purpose
300*6ff2deb2SEric Biggersof the salt padding is to make the salting "free" when the salted hash
301*6ff2deb2SEric Biggersstate is precomputed, then imported for each hash.
302*6ff2deb2SEric Biggers
303*6ff2deb2SEric BiggersExample: in the recommended configuration of SHA-256 and 4K blocks,
304*6ff2deb2SEric Biggers128 hash values fit in each block.  Thus, each level of the Merkle
305*6ff2deb2SEric Biggerstree is approximately 128 times smaller than the previous, and for
306*6ff2deb2SEric Biggerslarge files the Merkle tree's size converges to approximately 1/127 of
307*6ff2deb2SEric Biggersthe original file size.  However, for small files, the padding is
308*6ff2deb2SEric Biggerssignificant, making the space overhead proportionally more.
309*6ff2deb2SEric Biggers
310*6ff2deb2SEric Biggers.. _fsverity_descriptor:
311*6ff2deb2SEric Biggers
312*6ff2deb2SEric Biggersfs-verity descriptor
313*6ff2deb2SEric Biggers--------------------
314*6ff2deb2SEric Biggers
315*6ff2deb2SEric BiggersBy itself, the Merkle tree root hash is ambiguous.  For example, it
316*6ff2deb2SEric Biggerscan't a distinguish a large file from a small second file whose data
317*6ff2deb2SEric Biggersis exactly the top-level hash block of the first file.  Ambiguities
318*6ff2deb2SEric Biggersalso arise from the convention of padding to the next block boundary.
319*6ff2deb2SEric Biggers
320*6ff2deb2SEric BiggersTo solve this problem, the verity file measurement is actually
321*6ff2deb2SEric Biggerscomputed as a hash of the following structure, which contains the
322*6ff2deb2SEric BiggersMerkle tree root hash as well as other fields such as the file size::
323*6ff2deb2SEric Biggers
324*6ff2deb2SEric Biggers    struct fsverity_descriptor {
325*6ff2deb2SEric Biggers            __u8 version;           /* must be 1 */
326*6ff2deb2SEric Biggers            __u8 hash_algorithm;    /* Merkle tree hash algorithm */
327*6ff2deb2SEric Biggers            __u8 log_blocksize;     /* log2 of size of data and tree blocks */
328*6ff2deb2SEric Biggers            __u8 salt_size;         /* size of salt in bytes; 0 if none */
329*6ff2deb2SEric Biggers            __le32 sig_size;        /* must be 0 */
330*6ff2deb2SEric Biggers            __le64 data_size;       /* size of file the Merkle tree is built over */
331*6ff2deb2SEric Biggers            __u8 root_hash[64];     /* Merkle tree root hash */
332*6ff2deb2SEric Biggers            __u8 salt[32];          /* salt prepended to each hashed block */
333*6ff2deb2SEric Biggers            __u8 __reserved[144];   /* must be 0's */
334*6ff2deb2SEric Biggers    };
335*6ff2deb2SEric Biggers
336*6ff2deb2SEric BiggersNote that the ``sig_size`` field must be set to 0 for the purpose of
337*6ff2deb2SEric Biggerscomputing the file measurement, even if a signature was provided (or
338*6ff2deb2SEric Biggerswill be provided) to `FS_IOC_ENABLE_VERITY`_.
339*6ff2deb2SEric Biggers
340*6ff2deb2SEric BiggersBuilt-in signature verification
341*6ff2deb2SEric Biggers===============================
342*6ff2deb2SEric Biggers
343*6ff2deb2SEric BiggersWith CONFIG_FS_VERITY_BUILTIN_SIGNATURES=y, fs-verity supports putting
344*6ff2deb2SEric Biggersa portion of an authentication policy (see `Use cases`_) in the
345*6ff2deb2SEric Biggerskernel.  Specifically, it adds support for:
346*6ff2deb2SEric Biggers
347*6ff2deb2SEric Biggers1. At fs-verity module initialization time, a keyring ".fs-verity" is
348*6ff2deb2SEric Biggers   created.  The root user can add trusted X.509 certificates to this
349*6ff2deb2SEric Biggers   keyring using the add_key() system call, then (when done)
350*6ff2deb2SEric Biggers   optionally use keyctl_restrict_keyring() to prevent additional
351*6ff2deb2SEric Biggers   certificates from being added.
352*6ff2deb2SEric Biggers
353*6ff2deb2SEric Biggers2. `FS_IOC_ENABLE_VERITY`_ accepts a pointer to a PKCS#7 formatted
354*6ff2deb2SEric Biggers   detached signature in DER format of the file measurement.  On
355*6ff2deb2SEric Biggers   success, this signature is persisted alongside the Merkle tree.
356*6ff2deb2SEric Biggers   Then, any time the file is opened, the kernel will verify the
357*6ff2deb2SEric Biggers   file's actual measurement against this signature, using the
358*6ff2deb2SEric Biggers   certificates in the ".fs-verity" keyring.
359*6ff2deb2SEric Biggers
360*6ff2deb2SEric Biggers3. A new sysctl "fs.verity.require_signatures" is made available.
361*6ff2deb2SEric Biggers   When set to 1, the kernel requires that all verity files have a
362*6ff2deb2SEric Biggers   correctly signed file measurement as described in (2).
363*6ff2deb2SEric Biggers
364*6ff2deb2SEric BiggersFile measurements must be signed in the following format, which is
365*6ff2deb2SEric Biggerssimilar to the structure used by `FS_IOC_MEASURE_VERITY`_::
366*6ff2deb2SEric Biggers
367*6ff2deb2SEric Biggers    struct fsverity_signed_digest {
368*6ff2deb2SEric Biggers            char magic[8];                  /* must be "FSVerity" */
369*6ff2deb2SEric Biggers            __le16 digest_algorithm;
370*6ff2deb2SEric Biggers            __le16 digest_size;
371*6ff2deb2SEric Biggers            __u8 digest[];
372*6ff2deb2SEric Biggers    };
373*6ff2deb2SEric Biggers
374*6ff2deb2SEric Biggersfs-verity's built-in signature verification support is meant as a
375*6ff2deb2SEric Biggersrelatively simple mechanism that can be used to provide some level of
376*6ff2deb2SEric Biggersauthenticity protection for verity files, as an alternative to doing
377*6ff2deb2SEric Biggersthe signature verification in userspace or using IMA-appraisal.
378*6ff2deb2SEric BiggersHowever, with this mechanism, userspace programs still need to check
379*6ff2deb2SEric Biggersthat the verity bit is set, and there is no protection against verity
380*6ff2deb2SEric Biggersfiles being swapped around.
381*6ff2deb2SEric Biggers
382*6ff2deb2SEric BiggersFilesystem support
383*6ff2deb2SEric Biggers==================
384*6ff2deb2SEric Biggers
385*6ff2deb2SEric Biggersfs-verity is currently supported by the ext4 and f2fs filesystems.
386*6ff2deb2SEric BiggersThe CONFIG_FS_VERITY kconfig option must be enabled to use fs-verity
387*6ff2deb2SEric Biggerson either filesystem.
388*6ff2deb2SEric Biggers
389*6ff2deb2SEric Biggers``include/linux/fsverity.h`` declares the interface between the
390*6ff2deb2SEric Biggers``fs/verity/`` support layer and filesystems.  Briefly, filesystems
391*6ff2deb2SEric Biggersmust provide an ``fsverity_operations`` structure that provides
392*6ff2deb2SEric Biggersmethods to read and write the verity metadata to a filesystem-specific
393*6ff2deb2SEric Biggerslocation, including the Merkle tree blocks and
394*6ff2deb2SEric Biggers``fsverity_descriptor``.  Filesystems must also call functions in
395*6ff2deb2SEric Biggers``fs/verity/`` at certain times, such as when a file is opened or when
396*6ff2deb2SEric Biggerspages have been read into the pagecache.  (See `Verifying data`_.)
397*6ff2deb2SEric Biggers
398*6ff2deb2SEric Biggersext4
399*6ff2deb2SEric Biggers----
400*6ff2deb2SEric Biggers
401*6ff2deb2SEric Biggersext4 supports fs-verity since Linux TODO and e2fsprogs v1.45.2.
402*6ff2deb2SEric Biggers
403*6ff2deb2SEric BiggersTo create verity files on an ext4 filesystem, the filesystem must have
404*6ff2deb2SEric Biggersbeen formatted with ``-O verity`` or had ``tune2fs -O verity`` run on
405*6ff2deb2SEric Biggersit.  "verity" is an RO_COMPAT filesystem feature, so once set, old
406*6ff2deb2SEric Biggerskernels will only be able to mount the filesystem readonly, and old
407*6ff2deb2SEric Biggersversions of e2fsck will be unable to check the filesystem.  Moreover,
408*6ff2deb2SEric Biggerscurrently ext4 only supports mounting a filesystem with the "verity"
409*6ff2deb2SEric Biggersfeature when its block size is equal to PAGE_SIZE (often 4096 bytes).
410*6ff2deb2SEric Biggers
411*6ff2deb2SEric Biggersext4 sets the EXT4_VERITY_FL on-disk inode flag on verity files.  It
412*6ff2deb2SEric Biggerscan only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be cleared.
413*6ff2deb2SEric Biggers
414*6ff2deb2SEric Biggersext4 also supports encryption, which can be used simultaneously with
415*6ff2deb2SEric Biggersfs-verity.  In this case, the plaintext data is verified rather than
416*6ff2deb2SEric Biggersthe ciphertext.  This is necessary in order to make the file
417*6ff2deb2SEric Biggersmeasurement meaningful, since every file is encrypted differently.
418*6ff2deb2SEric Biggers
419*6ff2deb2SEric Biggersext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
420*6ff2deb2SEric Biggerspast the end of the file, starting at the first 64K boundary beyond
421*6ff2deb2SEric Biggersi_size.  This approach works because (a) verity files are readonly,
422*6ff2deb2SEric Biggersand (b) pages fully beyond i_size aren't visible to userspace but can
423*6ff2deb2SEric Biggersbe read/written internally by ext4 with only some relatively small
424*6ff2deb2SEric Biggerschanges to ext4.  This approach avoids having to depend on the
425*6ff2deb2SEric BiggersEA_INODE feature and on rearchitecturing ext4's xattr support to
426*6ff2deb2SEric Biggerssupport paging multi-gigabyte xattrs into memory, and to support
427*6ff2deb2SEric Biggersencrypting xattrs.  Note that the verity metadata *must* be encrypted
428*6ff2deb2SEric Biggerswhen the file is, since it contains hashes of the plaintext data.
429*6ff2deb2SEric Biggers
430*6ff2deb2SEric BiggersCurrently, ext4 verity only supports the case where the Merkle tree
431*6ff2deb2SEric Biggersblock size, filesystem block size, and page size are all the same.  It
432*6ff2deb2SEric Biggersalso only supports extent-based files.
433*6ff2deb2SEric Biggers
434*6ff2deb2SEric Biggersf2fs
435*6ff2deb2SEric Biggers----
436*6ff2deb2SEric Biggers
437*6ff2deb2SEric Biggersf2fs supports fs-verity since Linux TODO and f2fs-tools v1.11.0.
438*6ff2deb2SEric Biggers
439*6ff2deb2SEric BiggersTo create verity files on an f2fs filesystem, the filesystem must have
440*6ff2deb2SEric Biggersbeen formatted with ``-O verity``.
441*6ff2deb2SEric Biggers
442*6ff2deb2SEric Biggersf2fs sets the FADVISE_VERITY_BIT on-disk inode flag on verity files.
443*6ff2deb2SEric BiggersIt can only be set by `FS_IOC_ENABLE_VERITY`_, and it cannot be
444*6ff2deb2SEric Biggerscleared.
445*6ff2deb2SEric Biggers
446*6ff2deb2SEric BiggersLike ext4, f2fs stores the verity metadata (Merkle tree and
447*6ff2deb2SEric Biggersfsverity_descriptor) past the end of the file, starting at the first
448*6ff2deb2SEric Biggers64K boundary beyond i_size.  See explanation for ext4 above.
449*6ff2deb2SEric BiggersMoreover, f2fs supports at most 4096 bytes of xattr entries per inode
450*6ff2deb2SEric Biggerswhich wouldn't be enough for even a single Merkle tree block.
451*6ff2deb2SEric Biggers
452*6ff2deb2SEric BiggersCurrently, f2fs verity only supports a Merkle tree block size of 4096.
453*6ff2deb2SEric BiggersAlso, f2fs doesn't support enabling verity on files that currently
454*6ff2deb2SEric Biggershave atomic or volatile writes pending.
455*6ff2deb2SEric Biggers
456*6ff2deb2SEric BiggersImplementation details
457*6ff2deb2SEric Biggers======================
458*6ff2deb2SEric Biggers
459*6ff2deb2SEric BiggersVerifying data
460*6ff2deb2SEric Biggers--------------
461*6ff2deb2SEric Biggers
462*6ff2deb2SEric Biggersfs-verity ensures that all reads of a verity file's data are verified,
463*6ff2deb2SEric Biggersregardless of which syscall is used to do the read (e.g. mmap(),
464*6ff2deb2SEric Biggersread(), pread()) and regardless of whether it's the first read or a
465*6ff2deb2SEric Biggerslater read (unless the later read can return cached data that was
466*6ff2deb2SEric Biggersalready verified).  Below, we describe how filesystems implement this.
467*6ff2deb2SEric Biggers
468*6ff2deb2SEric BiggersPagecache
469*6ff2deb2SEric Biggers~~~~~~~~~
470*6ff2deb2SEric Biggers
471*6ff2deb2SEric BiggersFor filesystems using Linux's pagecache, the ``->readpage()`` and
472*6ff2deb2SEric Biggers``->readpages()`` methods must be modified to verify pages before they
473*6ff2deb2SEric Biggersare marked Uptodate.  Merely hooking ``->read_iter()`` would be
474*6ff2deb2SEric Biggersinsufficient, since ``->read_iter()`` is not used for memory maps.
475*6ff2deb2SEric Biggers
476*6ff2deb2SEric BiggersTherefore, fs/verity/ provides a function fsverity_verify_page() which
477*6ff2deb2SEric Biggersverifies a page that has been read into the pagecache of a verity
478*6ff2deb2SEric Biggersinode, but is still locked and not Uptodate, so it's not yet readable
479*6ff2deb2SEric Biggersby userspace.  As needed to do the verification,
480*6ff2deb2SEric Biggersfsverity_verify_page() will call back into the filesystem to read
481*6ff2deb2SEric BiggersMerkle tree pages via fsverity_operations::read_merkle_tree_page().
482*6ff2deb2SEric Biggers
483*6ff2deb2SEric Biggersfsverity_verify_page() returns false if verification failed; in this
484*6ff2deb2SEric Biggerscase, the filesystem must not set the page Uptodate.  Following this,
485*6ff2deb2SEric Biggersas per the usual Linux pagecache behavior, attempts by userspace to
486*6ff2deb2SEric Biggersread() from the part of the file containing the page will fail with
487*6ff2deb2SEric BiggersEIO, and accesses to the page within a memory map will raise SIGBUS.
488*6ff2deb2SEric Biggers
489*6ff2deb2SEric Biggersfsverity_verify_page() currently only supports the case where the
490*6ff2deb2SEric BiggersMerkle tree block size is equal to PAGE_SIZE (often 4096 bytes).
491*6ff2deb2SEric Biggers
492*6ff2deb2SEric BiggersIn principle, fsverity_verify_page() verifies the entire path in the
493*6ff2deb2SEric BiggersMerkle tree from the data page to the root hash.  However, for
494*6ff2deb2SEric Biggersefficiency the filesystem may cache the hash pages.  Therefore,
495*6ff2deb2SEric Biggersfsverity_verify_page() only ascends the tree reading hash pages until
496*6ff2deb2SEric Biggersan already-verified hash page is seen, as indicated by the PageChecked
497*6ff2deb2SEric Biggersbit being set.  It then verifies the path to that page.
498*6ff2deb2SEric Biggers
499*6ff2deb2SEric BiggersThis optimization, which is also used by dm-verity, results in
500*6ff2deb2SEric Biggersexcellent sequential read performance.  This is because usually (e.g.
501*6ff2deb2SEric Biggers127 in 128 times for 4K blocks and SHA-256) the hash page from the
502*6ff2deb2SEric Biggersbottom level of the tree will already be cached and checked from
503*6ff2deb2SEric Biggersreading a previous data page.  However, random reads perform worse.
504*6ff2deb2SEric Biggers
505*6ff2deb2SEric BiggersBlock device based filesystems
506*6ff2deb2SEric Biggers~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
507*6ff2deb2SEric Biggers
508*6ff2deb2SEric BiggersBlock device based filesystems (e.g. ext4 and f2fs) in Linux also use
509*6ff2deb2SEric Biggersthe pagecache, so the above subsection applies too.  However, they
510*6ff2deb2SEric Biggersalso usually read many pages from a file at once, grouped into a
511*6ff2deb2SEric Biggersstructure called a "bio".  To make it easier for these types of
512*6ff2deb2SEric Biggersfilesystems to support fs-verity, fs/verity/ also provides a function
513*6ff2deb2SEric Biggersfsverity_verify_bio() which verifies all pages in a bio.
514*6ff2deb2SEric Biggers
515*6ff2deb2SEric Biggersext4 and f2fs also support encryption.  If a verity file is also
516*6ff2deb2SEric Biggersencrypted, the pages must be decrypted before being verified.  To
517*6ff2deb2SEric Biggerssupport this, these filesystems allocate a "post-read context" for
518*6ff2deb2SEric Biggerseach bio and store it in ``->bi_private``::
519*6ff2deb2SEric Biggers
520*6ff2deb2SEric Biggers    struct bio_post_read_ctx {
521*6ff2deb2SEric Biggers           struct bio *bio;
522*6ff2deb2SEric Biggers           struct work_struct work;
523*6ff2deb2SEric Biggers           unsigned int cur_step;
524*6ff2deb2SEric Biggers           unsigned int enabled_steps;
525*6ff2deb2SEric Biggers    };
526*6ff2deb2SEric Biggers
527*6ff2deb2SEric Biggers``enabled_steps`` is a bitmask that specifies whether decryption,
528*6ff2deb2SEric Biggersverity, or both is enabled.  After the bio completes, for each needed
529*6ff2deb2SEric Biggerspostprocessing step the filesystem enqueues the bio_post_read_ctx on a
530*6ff2deb2SEric Biggersworkqueue, and then the workqueue work does the decryption or
531*6ff2deb2SEric Biggersverification.  Finally, pages where no decryption or verity error
532*6ff2deb2SEric Biggersoccurred are marked Uptodate, and the pages are unlocked.
533*6ff2deb2SEric Biggers
534*6ff2deb2SEric BiggersFiles on ext4 and f2fs may contain holes.  Normally, ``->readpages()``
535*6ff2deb2SEric Biggerssimply zeroes holes and sets the corresponding pages Uptodate; no bios
536*6ff2deb2SEric Biggersare issued.  To prevent this case from bypassing fs-verity, these
537*6ff2deb2SEric Biggersfilesystems use fsverity_verify_page() to verify hole pages.
538*6ff2deb2SEric Biggers
539*6ff2deb2SEric Biggersext4 and f2fs disable direct I/O on verity files, since otherwise
540*6ff2deb2SEric Biggersdirect I/O would bypass fs-verity.  (They also do the same for
541*6ff2deb2SEric Biggersencrypted files.)
542*6ff2deb2SEric Biggers
543*6ff2deb2SEric BiggersUserspace utility
544*6ff2deb2SEric Biggers=================
545*6ff2deb2SEric Biggers
546*6ff2deb2SEric BiggersThis document focuses on the kernel, but a userspace utility for
547*6ff2deb2SEric Biggersfs-verity can be found at:
548*6ff2deb2SEric Biggers
549*6ff2deb2SEric Biggers	https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/fsverity-utils.git
550*6ff2deb2SEric Biggers
551*6ff2deb2SEric BiggersSee the README.md file in the fsverity-utils source tree for details,
552*6ff2deb2SEric Biggersincluding examples of setting up fs-verity protected files.
553*6ff2deb2SEric Biggers
554*6ff2deb2SEric BiggersTests
555*6ff2deb2SEric Biggers=====
556*6ff2deb2SEric Biggers
557*6ff2deb2SEric BiggersTo test fs-verity, use xfstests.  For example, using `kvm-xfstests
558*6ff2deb2SEric Biggers<https://github.com/tytso/xfstests-bld/blob/master/Documentation/kvm-quickstart.md>`_::
559*6ff2deb2SEric Biggers
560*6ff2deb2SEric Biggers    kvm-xfstests -c ext4,f2fs -g verity
561*6ff2deb2SEric Biggers
562*6ff2deb2SEric BiggersFAQ
563*6ff2deb2SEric Biggers===
564*6ff2deb2SEric Biggers
565*6ff2deb2SEric BiggersThis section answers frequently asked questions about fs-verity that
566*6ff2deb2SEric Biggersweren't already directly answered in other parts of this document.
567*6ff2deb2SEric Biggers
568*6ff2deb2SEric Biggers:Q: Why isn't fs-verity part of IMA?
569*6ff2deb2SEric Biggers:A: fs-verity and IMA (Integrity Measurement Architecture) have
570*6ff2deb2SEric Biggers    different focuses.  fs-verity is a filesystem-level mechanism for
571*6ff2deb2SEric Biggers    hashing individual files using a Merkle tree.  In contrast, IMA
572*6ff2deb2SEric Biggers    specifies a system-wide policy that specifies which files are
573*6ff2deb2SEric Biggers    hashed and what to do with those hashes, such as log them,
574*6ff2deb2SEric Biggers    authenticate them, or add them to a measurement list.
575*6ff2deb2SEric Biggers
576*6ff2deb2SEric Biggers    IMA is planned to support the fs-verity hashing mechanism as an
577*6ff2deb2SEric Biggers    alternative to doing full file hashes, for people who want the
578*6ff2deb2SEric Biggers    performance and security benefits of the Merkle tree based hash.
579*6ff2deb2SEric Biggers    But it doesn't make sense to force all uses of fs-verity to be
580*6ff2deb2SEric Biggers    through IMA.  As a standalone filesystem feature, fs-verity
581*6ff2deb2SEric Biggers    already meets many users' needs, and it's testable like other
582*6ff2deb2SEric Biggers    filesystem features e.g. with xfstests.
583*6ff2deb2SEric Biggers
584*6ff2deb2SEric Biggers:Q: Isn't fs-verity useless because the attacker can just modify the
585*6ff2deb2SEric Biggers    hashes in the Merkle tree, which is stored on-disk?
586*6ff2deb2SEric Biggers:A: To verify the authenticity of an fs-verity file you must verify
587*6ff2deb2SEric Biggers    the authenticity of the "file measurement", which is basically the
588*6ff2deb2SEric Biggers    root hash of the Merkle tree.  See `Use cases`_.
589*6ff2deb2SEric Biggers
590*6ff2deb2SEric Biggers:Q: Isn't fs-verity useless because the attacker can just replace a
591*6ff2deb2SEric Biggers    verity file with a non-verity one?
592*6ff2deb2SEric Biggers:A: See `Use cases`_.  In the initial use case, it's really trusted
593*6ff2deb2SEric Biggers    userspace code that authenticates the files; fs-verity is just a
594*6ff2deb2SEric Biggers    tool to do this job efficiently and securely.  The trusted
595*6ff2deb2SEric Biggers    userspace code will consider non-verity files to be inauthentic.
596*6ff2deb2SEric Biggers
597*6ff2deb2SEric Biggers:Q: Why does the Merkle tree need to be stored on-disk?  Couldn't you
598*6ff2deb2SEric Biggers    store just the root hash?
599*6ff2deb2SEric Biggers:A: If the Merkle tree wasn't stored on-disk, then you'd have to
600*6ff2deb2SEric Biggers    compute the entire tree when the file is first accessed, even if
601*6ff2deb2SEric Biggers    just one byte is being read.  This is a fundamental consequence of
602*6ff2deb2SEric Biggers    how Merkle tree hashing works.  To verify a leaf node, you need to
603*6ff2deb2SEric Biggers    verify the whole path to the root hash, including the root node
604*6ff2deb2SEric Biggers    (the thing which the root hash is a hash of).  But if the root
605*6ff2deb2SEric Biggers    node isn't stored on-disk, you have to compute it by hashing its
606*6ff2deb2SEric Biggers    children, and so on until you've actually hashed the entire file.
607*6ff2deb2SEric Biggers
608*6ff2deb2SEric Biggers    That defeats most of the point of doing a Merkle tree-based hash,
609*6ff2deb2SEric Biggers    since if you have to hash the whole file ahead of time anyway,
610*6ff2deb2SEric Biggers    then you could simply do sha256(file) instead.  That would be much
611*6ff2deb2SEric Biggers    simpler, and a bit faster too.
612*6ff2deb2SEric Biggers
613*6ff2deb2SEric Biggers    It's true that an in-memory Merkle tree could still provide the
614*6ff2deb2SEric Biggers    advantage of verification on every read rather than just on the
615*6ff2deb2SEric Biggers    first read.  However, it would be inefficient because every time a
616*6ff2deb2SEric Biggers    hash page gets evicted (you can't pin the entire Merkle tree into
617*6ff2deb2SEric Biggers    memory, since it may be very large), in order to restore it you
618*6ff2deb2SEric Biggers    again need to hash everything below it in the tree.  This again
619*6ff2deb2SEric Biggers    defeats most of the point of doing a Merkle tree-based hash, since
620*6ff2deb2SEric Biggers    a single block read could trigger re-hashing gigabytes of data.
621*6ff2deb2SEric Biggers
622*6ff2deb2SEric Biggers:Q: But couldn't you store just the leaf nodes and compute the rest?
623*6ff2deb2SEric Biggers:A: See previous answer; this really just moves up one level, since
624*6ff2deb2SEric Biggers    one could alternatively interpret the data blocks as being the
625*6ff2deb2SEric Biggers    leaf nodes of the Merkle tree.  It's true that the tree can be
626*6ff2deb2SEric Biggers    computed much faster if the leaf level is stored rather than just
627*6ff2deb2SEric Biggers    the data, but that's only because each level is less than 1% the
628*6ff2deb2SEric Biggers    size of the level below (assuming the recommended settings of
629*6ff2deb2SEric Biggers    SHA-256 and 4K blocks).  For the exact same reason, by storing
630*6ff2deb2SEric Biggers    "just the leaf nodes" you'd already be storing over 99% of the
631*6ff2deb2SEric Biggers    tree, so you might as well simply store the whole tree.
632*6ff2deb2SEric Biggers
633*6ff2deb2SEric Biggers:Q: Can the Merkle tree be built ahead of time, e.g. distributed as
634*6ff2deb2SEric Biggers    part of a package that is installed to many computers?
635*6ff2deb2SEric Biggers:A: This isn't currently supported.  It was part of the original
636*6ff2deb2SEric Biggers    design, but was removed to simplify the kernel UAPI and because it
637*6ff2deb2SEric Biggers    wasn't a critical use case.  Files are usually installed once and
638*6ff2deb2SEric Biggers    used many times, and cryptographic hashing is somewhat fast on
639*6ff2deb2SEric Biggers    most modern processors.
640*6ff2deb2SEric Biggers
641*6ff2deb2SEric Biggers:Q: Why doesn't fs-verity support writes?
642*6ff2deb2SEric Biggers:A: Write support would be very difficult and would require a
643*6ff2deb2SEric Biggers    completely different design, so it's well outside the scope of
644*6ff2deb2SEric Biggers    fs-verity.  Write support would require:
645*6ff2deb2SEric Biggers
646*6ff2deb2SEric Biggers    - A way to maintain consistency between the data and hashes,
647*6ff2deb2SEric Biggers      including all levels of hashes, since corruption after a crash
648*6ff2deb2SEric Biggers      (especially of potentially the entire file!) is unacceptable.
649*6ff2deb2SEric Biggers      The main options for solving this are data journalling,
650*6ff2deb2SEric Biggers      copy-on-write, and log-structured volume.  But it's very hard to
651*6ff2deb2SEric Biggers      retrofit existing filesystems with new consistency mechanisms.
652*6ff2deb2SEric Biggers      Data journalling is available on ext4, but is very slow.
653*6ff2deb2SEric Biggers
654*6ff2deb2SEric Biggers    - Rebuilding the the Merkle tree after every write, which would be
655*6ff2deb2SEric Biggers      extremely inefficient.  Alternatively, a different authenticated
656*6ff2deb2SEric Biggers      dictionary structure such as an "authenticated skiplist" could
657*6ff2deb2SEric Biggers      be used.  However, this would be far more complex.
658*6ff2deb2SEric Biggers
659*6ff2deb2SEric Biggers    Compare it to dm-verity vs. dm-integrity.  dm-verity is very
660*6ff2deb2SEric Biggers    simple: the kernel just verifies read-only data against a
661*6ff2deb2SEric Biggers    read-only Merkle tree.  In contrast, dm-integrity supports writes
662*6ff2deb2SEric Biggers    but is slow, is much more complex, and doesn't actually support
663*6ff2deb2SEric Biggers    full-device authentication since it authenticates each sector
664*6ff2deb2SEric Biggers    independently, i.e. there is no "root hash".  It doesn't really
665*6ff2deb2SEric Biggers    make sense for the same device-mapper target to support these two
666*6ff2deb2SEric Biggers    very different cases; the same applies to fs-verity.
667*6ff2deb2SEric Biggers
668*6ff2deb2SEric Biggers:Q: Since verity files are immutable, why isn't the immutable bit set?
669*6ff2deb2SEric Biggers:A: The existing "immutable" bit (FS_IMMUTABLE_FL) already has a
670*6ff2deb2SEric Biggers    specific set of semantics which not only make the file contents
671*6ff2deb2SEric Biggers    read-only, but also prevent the file from being deleted, renamed,
672*6ff2deb2SEric Biggers    linked to, or having its owner or mode changed.  These extra
673*6ff2deb2SEric Biggers    properties are unwanted for fs-verity, so reusing the immutable
674*6ff2deb2SEric Biggers    bit isn't appropriate.
675*6ff2deb2SEric Biggers
676*6ff2deb2SEric Biggers:Q: Why does the API use ioctls instead of setxattr() and getxattr()?
677*6ff2deb2SEric Biggers:A: Abusing the xattr interface for basically arbitrary syscalls is
678*6ff2deb2SEric Biggers    heavily frowned upon by most of the Linux filesystem developers.
679*6ff2deb2SEric Biggers    An xattr should really just be an xattr on-disk, not an API to
680*6ff2deb2SEric Biggers    e.g. magically trigger construction of a Merkle tree.
681*6ff2deb2SEric Biggers
682*6ff2deb2SEric Biggers:Q: Does fs-verity support remote filesystems?
683*6ff2deb2SEric Biggers:A: Only ext4 and f2fs support is implemented currently, but in
684*6ff2deb2SEric Biggers    principle any filesystem that can store per-file verity metadata
685*6ff2deb2SEric Biggers    can support fs-verity, regardless of whether it's local or remote.
686*6ff2deb2SEric Biggers    Some filesystems may have fewer options of where to store the
687*6ff2deb2SEric Biggers    verity metadata; one possibility is to store it past the end of
688*6ff2deb2SEric Biggers    the file and "hide" it from userspace by manipulating i_size.  The
689*6ff2deb2SEric Biggers    data verification functions provided by ``fs/verity/`` also assume
690*6ff2deb2SEric Biggers    that the filesystem uses the Linux pagecache, but both local and
691*6ff2deb2SEric Biggers    remote filesystems normally do so.
692*6ff2deb2SEric Biggers
693*6ff2deb2SEric Biggers:Q: Why is anything filesystem-specific at all?  Shouldn't fs-verity
694*6ff2deb2SEric Biggers    be implemented entirely at the VFS level?
695*6ff2deb2SEric Biggers:A: There are many reasons why this is not possible or would be very
696*6ff2deb2SEric Biggers    difficult, including the following:
697*6ff2deb2SEric Biggers
698*6ff2deb2SEric Biggers    - To prevent bypassing verification, pages must not be marked
699*6ff2deb2SEric Biggers      Uptodate until they've been verified.  Currently, each
700*6ff2deb2SEric Biggers      filesystem is responsible for marking pages Uptodate via
701*6ff2deb2SEric Biggers      ``->readpages()``.  Therefore, currently it's not possible for
702*6ff2deb2SEric Biggers      the VFS to do the verification on its own.  Changing this would
703*6ff2deb2SEric Biggers      require significant changes to the VFS and all filesystems.
704*6ff2deb2SEric Biggers
705*6ff2deb2SEric Biggers    - It would require defining a filesystem-independent way to store
706*6ff2deb2SEric Biggers      the verity metadata.  Extended attributes don't work for this
707*6ff2deb2SEric Biggers      because (a) the Merkle tree may be gigabytes, but many
708*6ff2deb2SEric Biggers      filesystems assume that all xattrs fit into a single 4K
709*6ff2deb2SEric Biggers      filesystem block, and (b) ext4 and f2fs encryption doesn't
710*6ff2deb2SEric Biggers      encrypt xattrs, yet the Merkle tree *must* be encrypted when the
711*6ff2deb2SEric Biggers      file contents are, because it stores hashes of the plaintext
712*6ff2deb2SEric Biggers      file contents.
713*6ff2deb2SEric Biggers
714*6ff2deb2SEric Biggers      So the verity metadata would have to be stored in an actual
715*6ff2deb2SEric Biggers      file.  Using a separate file would be very ugly, since the
716*6ff2deb2SEric Biggers      metadata is fundamentally part of the file to be protected, and
717*6ff2deb2SEric Biggers      it could cause problems where users could delete the real file
718*6ff2deb2SEric Biggers      but not the metadata file or vice versa.  On the other hand,
719*6ff2deb2SEric Biggers      having it be in the same file would break applications unless
720*6ff2deb2SEric Biggers      filesystems' notion of i_size were divorced from the VFS's,
721*6ff2deb2SEric Biggers      which would be complex and require changes to all filesystems.
722*6ff2deb2SEric Biggers
723*6ff2deb2SEric Biggers    - It's desirable that FS_IOC_ENABLE_VERITY uses the filesystem's
724*6ff2deb2SEric Biggers      transaction mechanism so that either the file ends up with
725*6ff2deb2SEric Biggers      verity enabled, or no changes were made.  Allowing intermediate
726*6ff2deb2SEric Biggers      states to occur after a crash may cause problems.
727