#
09edf4d3 |
| 05-Nov-2019 |
Matthew Bobrowski <mbobrowski@mbobrowski.org> |
ext4: introduce new callback for IOMAP_REPORT
As part of the ext4_iomap_begin() cleanups that precede this patch, we also split up the IOMAP_REPORT branch into a completely separate ->iomap_begin()
ext4: introduce new callback for IOMAP_REPORT
As part of the ext4_iomap_begin() cleanups that precede this patch, we also split up the IOMAP_REPORT branch into a completely separate ->iomap_begin() callback named ext4_iomap_begin_report(). Again, the raionale for this change is to reduce the overall clutter within ext4_iomap_begin().
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/5c97a569e26ddb6696e3d3ac9fbde41317e029a0.1572949325.git.mbobrowski@mbobrowski.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
Revision tags: v5.3.7 |
|
#
c8cc8816 |
| 16-Oct-2019 |
Ritesh Harjani <riteshh@linux.ibm.com> |
ext4: Add support for blocksize < pagesize in dioread_nolock
This patch adds the support for blocksize < pagesize for dioread_nolock feature.
Since in case of blocksize < pagesize, we can have mult
ext4: Add support for blocksize < pagesize in dioread_nolock
This patch adds the support for blocksize < pagesize for dioread_nolock feature.
Since in case of blocksize < pagesize, we can have multiple small buffers of page as unwritten extents, we need to maintain a vector of these unwritten extents which needs the conversion after the IO is complete. Thus, we maintain a list of tuple <offset, size> pair (io_end_vec) for this & traverse this list to do the unwritten to written conversion.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191016073711.4141-5-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
a00713ea |
| 16-Oct-2019 |
Ritesh Harjani <riteshh@linux.ibm.com> |
ext4: Add API to bring in support for unwritten io_end_vec conversion
This patch just brings in the API for conversion of unwritten io_end_vec extents which will be required for blocksize < pagesize
ext4: Add API to bring in support for unwritten io_end_vec conversion
This patch just brings in the API for conversion of unwritten io_end_vec extents which will be required for blocksize < pagesize support for dioread_nolock feature.
No functional changes in this patch.
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20191016073711.4141-3-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
Revision tags: v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12 |
|
#
cba465b4 |
| 04-Sep-2019 |
Deepa Dinamani <deepa.kernel@gmail.com> |
ext4: Reduce ext4 timestamp warnings
When ext4 file systems were created intentionally with 128 byte inodes, the rate-limited warning of eventual possible timestamp overflow are still emitted rather
ext4: Reduce ext4 timestamp warnings
When ext4 file systems were created intentionally with 128 byte inodes, the rate-limited warning of eventual possible timestamp overflow are still emitted rather frequently. Remove the warning for now.
Discussion for whether any warning is needed, and where it should be emitted, can be found at https://lore.kernel.org/lkml/1567523922.5576.57.camel@lca.pw/. I can post a separate follow-up patch after the conclusion.
Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
show more ...
|
Revision tags: v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17, v4.16, v4.15 |
|
#
4881c497 |
| 21-Jan-2018 |
Deepa Dinamani <deepa.kernel@gmail.com> |
ext4: Initialize timestamps limits
ext4 has different overflow limits for max filesystem timestamps based on the extra bytes available.
The timestamp limits are calculated according to the encoding
ext4: Initialize timestamps limits
ext4 has different overflow limits for max filesystem timestamps based on the extra bytes available.
The timestamp limits are calculated according to the encoding table in a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):
* extra msb of adjust for signed * epoch 32-bit 32-bit tv_sec to * bits time decoded 64-bit tv_sec 64-bit tv_sec valid time range * 0 0 1 -0x80000000..-0x00000001 0x000000000 1901-12-13..1969-12-31 * 0 0 0 0x000000000..0x07fffffff 0x000000000 1970-01-01..2038-01-19 * 0 1 1 0x080000000..0x0ffffffff 0x100000000 2038-01-19..2106-02-07 * 0 1 0 0x100000000..0x17fffffff 0x100000000 2106-02-07..2174-02-25 * 1 0 1 0x180000000..0x1ffffffff 0x200000000 2174-02-25..2242-03-16 * 1 0 0 0x200000000..0x27fffffff 0x200000000 2242-03-16..2310-04-04 * 1 1 1 0x280000000..0x2ffffffff 0x300000000 2310-04-04..2378-04-22 * 1 1 0 0x300000000..0x37fffffff 0x300000000 2378-04-22..2446-05-10
Note that the time limits are not correct for deletion times.
Added a warn when an inode cannot be extended to incorporate an extended timestamp.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Acked-by: Jeff Layton <jlayton@kernel.org> Cc: tytso@mit.edu Cc: adilger.kernel@dilger.ca Cc: linux-ext4@vger.kernel.org
show more ...
|
#
7727ae52 |
| 28-Aug-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
ext4: fix potential use after free after remounting with noblock_validity
Remount process will release system zone which was allocated before if "noblock_validity" is specified. If we mount an ext4
ext4: fix potential use after free after remounting with noblock_validity
Remount process will release system zone which was allocated before if "noblock_validity" is specified. If we mount an ext4 file system to two mountpoints with default mount options, and then remount one of them with "noblock_validity", it may trigger a use after free problem when someone accessing the other one.
# mount /dev/sda foo # mount /dev/sda bar
User access mountpoint "foo" | Remount mountpoint "bar" | ext4_map_blocks() | ext4_remount() check_block_validity() | ext4_setup_system_zone() ext4_data_block_valid() | ext4_release_system_zone() | free system_blks rb nodes access system_blks rb nodes | trigger use after free |
This problem can also be reproduced by one mountpint, At the same time, add_system_zone() can get called during remount as well so there can be racing ext4_data_block_valid() reading the rbtree at the same time.
This patch add RCU to protect system zone from releasing or building when doing a remount which inverse current "noblock_validity" mount option. It assign the rbtree after the whole tree was complete and do actual freeing after rcu grace period, avoid any intermediate state.
Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
show more ...
|
#
8fcc3a58 |
| 22-Aug-2019 |
Eric Whitney <enwlinux@gmail.com> |
ext4: rework reserved cluster accounting when invalidating pages
The goal of this patch is to remove two references to the buffer delay bit in ext4_da_page_release_reservation() as part of a larger
ext4: rework reserved cluster accounting when invalidating pages
The goal of this patch is to remove two references to the buffer delay bit in ext4_da_page_release_reservation() as part of a larger effort to remove all such references from ext4. These two references are principally used to reduce the reserved block/cluster count when pages are invalidated as a result of truncating, punching holes, or collapsing a block range in a file. The entire function is removed and replaced with code in ext4_es_remove_extent() that reduces the reserved count as a side effect of removing a block range from delayed and not unwritten extents in the extent status tree as is done when truncating, punching holes, or collapsing ranges.
The code is written to minimize the number of searches descending from rb tree roots for scalability.
Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
7963e5ac |
| 22-Aug-2019 |
ZhangXiaoxu <zhangxiaoxu5@huawei.com> |
ext4: treat buffers with write errors as containing valid data
I got some errors when I repair an ext4 volume which stacked by an iscsi target: Entry 'test60' in / (2) has deleted/unused inode 7
ext4: treat buffers with write errors as containing valid data
I got some errors when I repair an ext4 volume which stacked by an iscsi target: Entry 'test60' in / (2) has deleted/unused inode 73750. Clear? It can be reproduced when the network not good enough.
When I debug this I found ext4 will read entry buffer from disk and the buffer is marked with write_io_error.
If the buffer is marked with write_io_error, it means it already wroten to journal, and not checked out to disk. IOW, the journal is newer than the data in disk. If this journal record 'delete test60', it means the 'test60' still on the disk metadata.
In this case, if we read the buffer from disk successfully and create file continue, the new journal record will overwrite the journal which record 'delete test60', then the entry corruptioned.
So, use the buffer rather than read from disk if the buffer is marked with write_io_error.
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
22cfe4b4 |
| 22-Jul-2019 |
Eric Biggers <ebiggers@google.com> |
ext4: add fs-verity read support
Make ext4_mpage_readpages() verify data as it is read from fs-verity files, using the helper functions from fs/verity/.
To support both encryption and verity simult
ext4: add fs-verity read support
Make ext4_mpage_readpages() verify data as it is read from fs-verity files, using the helper functions from fs/verity/.
To support both encryption and verity simultaneously, this required refactoring the decryption workflow into a generic "post-read processing" workflow which can do decryption, verification, or both.
The case where the ext4 block size is not equal to the PAGE_SIZE is not supported yet, since in that case ext4_mpage_readpages() sometimes falls back to block_read_full_page(), which does not support fs-verity yet.
Co-developed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
#
c93d8f88 |
| 22-Jul-2019 |
Eric Biggers <ebiggers@google.com> |
ext4: add basic fs-verity support
Add most of fs-verity support to ext4. fs-verity is a filesystem feature that enables transparent integrity protection and authentication of read-only files. It u
ext4: add basic fs-verity support
Add most of fs-verity support to ext4. fs-verity is a filesystem feature that enables transparent integrity protection and authentication of read-only files. It uses a dm-verity like mechanism at the file level: a Merkle tree is used to verify any block in the file in log(filesize) time. It is implemented mainly by helper functions in fs/verity/. See Documentation/filesystems/fsverity.rst for the full documentation.
This commit adds all of ext4 fs-verity support except for the actual data verification, including:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an inode and reading/writing the verity metadata.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
ext4 stores the verity metadata (Merkle tree and fsverity_descriptor) past the end of the file, starting at the first 64K boundary beyond i_size. This approach works because (a) verity files are readonly, and (b) pages fully beyond i_size aren't visible to userspace but can be read/written internally by ext4 with only some relatively small changes to ext4. This approach avoids having to depend on the EA_INODE feature and on rearchitecturing ext4's xattr support to support paging multi-gigabyte xattrs into memory, and to support encrypting xattrs. Note that the verity metadata *must* be encrypted when the file is, since it contains hashes of the plaintext data.
This patch incorporates work by Theodore Ts'o and Chandan Rajendra.
Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
show more ...
|
#
cd2d9922 |
| 12-Aug-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: drop legacy pre-1970 encoding workaround
Originally, support for expanded timestamps had a bug in that pre-1970 times were erroneously encoded as being in the the 24th century. This was fixed
ext4: drop legacy pre-1970 encoding workaround
Originally, support for expanded timestamps had a bug in that pre-1970 times were erroneously encoded as being in the the 24th century. This was fixed in commit a4dad1ae24f8 ("ext4: Fix handling of extended tv_sec") which landed in 4.4. Starting with 4.4, pre-1970 timestamps were correctly encoded, but for backwards compatibility those incorrectly encoded timestamps were mapped back to the pre-1970 dates.
Given that backwards compatibility workaround has been around for 4 years, and given that running e2fsck from e2fsprogs 1.43.2 and later will offer to fix these timestamps (which has been released for 3 years), it's past time to drop the legacy workaround from the kernel.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
bb5835ed |
| 11-Aug-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add new ioctl EXT4_IOC_GET_ES_CACHE
For debugging reasons, it's useful to know the contents of the extent cache. Since the extent cache contains much of what is in the fiemap ioctl, use an fi
ext4: add new ioctl EXT4_IOC_GET_ES_CACHE
For debugging reasons, it's useful to know the contents of the extent cache. Since the extent cache contains much of what is in the fiemap ioctl, use an fiemap-style interface to return this information.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
1ad3ea6e |
| 11-Aug-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add a new ioctl EXT4_IOC_GETSTATE
The new ioctl EXT4_IOC_GETSTATE returns some of the dynamic state of an ext4 inode for debugging purposes.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
b0c013e2 |
| 11-Aug-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE
The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent status cache to be cleared out. This is intended for use for debugging.
Signed-off-
ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE
The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent status cache to be cleared out. This is intended for use for debugging.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
7633b08b |
| 21-Jun-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
Clean up namespace pollution by the inline_data code.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ddce3b94 |
| 21-Jun-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: refactor initialize_dirent_tail()
Move the calculation of the location of the dirent tail into initialize_dirent_tail(). Also prefix the function with ext4_ to fix kernel namepsace polution.
ext4: refactor initialize_dirent_tail()
Move the calculation of the location of the dirent tail into initialize_dirent_tail(). Also prefix the function with ext4_ to fix kernel namepsace polution.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
f036adb3 |
| 21-Jun-2019 |
Theodore Ts'o <tytso@mit.edu> |
ext4: rename "dirent_csum" functions to use "dirblock"
Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set() don't actually operate on a directory entry, but a directory block. And
ext4: rename "dirent_csum" functions to use "dirblock"
Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set() don't actually operate on a directory entry, but a directory block. And while they take a struct ext4_dir_entry *dirent as an argument, it had better be the first directory at the beginning of the direct block, or things will go very wrong.
Rename the following functions so that things make more sense, and remove a lot of confusing casts along the way:
ext4_dirent_csum_verify -> ext4_dirblock_csum_verify ext4_dirent_csum_set -> ext4_dirblock_csum_set ext4_dirent_csum -> ext4_dirblock_csum ext4_handle_dirty_dirent_node -> ext4_handle_dirty_dirblock
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
3ae72562 |
| 19-Jun-2019 |
Gabriel Krisman Bertazi <krisman@collabora.com> |
ext4: optimize case-insensitive lookups
Temporarily cache a casefolded version of the file name under lookup in ext4_filename, to avoid repeatedly casefolding it. I got up to 30% speedup on lookups
ext4: optimize case-insensitive lookups
Temporarily cache a casefolded version of the file name under lookup in ext4_filename, to avoid repeatedly casefolding it. I got up to 30% speedup on lookups of large directories (>100k entries), depending on the length of the string under lookup.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
7ddf79a1 |
| 09-Jun-2019 |
Wang Shilong <wshilong@ddn.com> |
ext4: only set project inherit bit for directory
It doesn't make any sense to have project inherit bits for regular files, even though this won't cause any problem, but it is better fix this.
Signe
ext4: only set project inherit bit for directory
It doesn't make any sense to have project inherit bits for regular files, even though this won't cause any problem, but it is better fix this.
Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
show more ...
|
#
b886ee3e |
| 25-Apr-2019 |
Gabriel Krisman Bertazi <krisman@collabora.co.uk> |
ext4: Support case-insensitive file name lookups
This patch implements the actual support for case-insensitive file name lookups in ext4, based on the feature bit and the encoding stored in the supe
ext4: Support case-insensitive file name lookups
This patch implements the actual support for case-insensitive file name lookups in ext4, based on the feature bit and the encoding stored in the superblock.
A filesystem that has the casefold feature set is able to configure directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups to succeed in that directory in a case-insensitive fashion, i.e: match a directory entry even if the name used by userspace is not a byte per byte match with the disk name, but is an equivalent case-insensitive version of the Unicode string. This operation is called a case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories and inherited by its children. This attribute can only be enabled on empty directories for filesystems that support the encoding feature, thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, Ext4 only stores the first equivalent name dentry used in the dcache. This is done to prevent unintentional duplication of dentries in the dcache, while also allowing the VFS code to quickly find the right entry in the cache despite which equivalent string was used in a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the casefolded string, such that we always have a well-known bucket for all the equivalencies of the same string. d_compare() uses the utf8_strncasecmp() infrastructure, which handles the comparison of equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they would need to be invalidated anyway, because we can't trust missing file dentries. This is bad for performance but requires some leveraging of the vfs layer to fix. We can live without that for now, and so does everyone else.
* on-disk data:
Despite using a specific version of the name as the internal representation within the dcache, the name stored and fetched from the disk is a byte-per-byte match with what the user requested, making this implementation 'name-preserving'. i.e. no actual information is lost when writing to storage.
DX is supported by modifying the hashes used in +F directories to make them case/encoding-aware. The new disk hashes are calculated as the hash of the full casefolded string, instead of the string directly. This allows us to efficiently search for file names in the htree without requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat it as an opaque byte sequence, ignoring the encoding and reverting to the old behavior for that unique file. This means that case-insensitive file name lookup will not work only for that file. An optional bit can be set in the superblock telling the filesystem code and userspace tools to enforce the encoding. When that optional bit is set, any attempt to create a file name using an invalid UTF-8 sequence will fail and return an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in ext4 is implemented lives in fs/unicode, and is based on a previous version developed by SGI. It implements the Canonical decomposition (NFD) algorithm described by the Unicode specification 12.1, or higher, combined with the elimination of ignorable code points (NFDi) and full case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for EXT4 because:
- It has a lower cost than NFC/NFKC (which requires decomposing to NFD as an intermediary step) - It doesn't eliminate important semantic meaning like compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because different languages have conflicting rules, which would require the specialization of the filesystem to a given locale, which brings all sorts of problems for removable media and for users who use more than one language.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
c83ad55e |
| 25-Apr-2019 |
Gabriel Krisman Bertazi <krisman@collabora.co.uk> |
ext4: include charset encoding information in the superblock
Support for encoding is considered an incompatible feature, since it has potential to create collisions of file names in existing filesys
ext4: include charset encoding information in the superblock
Support for encoding is considered an incompatible feature, since it has potential to create collisions of file names in existing filesystems. If the feature flag is not enabled, the entire filesystem will operate on opaque byte sequences, respecting the original behavior.
The s_encoding field stores a magic number indicating the encoding format and version used globally by file and directory names in the filesystem. The s_encoding_flags defines policies for using the charset encoding, like how to handle invalid sequences. The magic number is mapped to the exact charset table, but the mapping is specific to ext4. Since we don't have any commitment to support old encodings, the only encoding I am supporting right now is utf8-12.1.0.
The current implementation prevents the user from enabling encoding and per-directory encryption on the same filesystem at the same time. The incompatibility between these features lies in how we do efficient directory searches when we cannot be sure the encryption of the user provided fname will match the actual hash stored in the disk without decrypting every directory entry, because of normalization cases. My quickest solution is to simply block the concurrent use of these features for now, and enable it later, once we have a better solution.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
877b5691 |
| 14-Apr-2019 |
Eric Biggers <ebiggers@google.com> |
crypto: shash - remove shash_desc::flags
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algori
crypto: shash - remove shash_desc::flags
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op.
With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep.
Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all.
Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
show more ...
|
#
b01531db |
| 20-Mar-2019 |
Eric Biggers <ebiggers@google.com> |
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup(): a. Try to load the directory's encry
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
fe53cbc5 |
| 06-Apr-2019 |
Eric Biggers <ebiggers@google.com> |
ext4: remove incorrect comment for NEXT_ORPHAN()
The comment above NEXT_ORPHAN() was meant for ext4_encrypted_inode(), which was moved by commit a7550b30ab70 ("ext4 crypto: migrate into vfs's crypto
ext4: remove incorrect comment for NEXT_ORPHAN()
The comment above NEXT_ORPHAN() was meant for ext4_encrypted_inode(), which was moved by commit a7550b30ab70 ("ext4 crypto: migrate into vfs's crypto engine") but the comment was accidentally left in place. Since ext4_encrypted_inode() has now been removed, just remove the comment.
Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
show more ...
|
#
c9e716eb |
| 14-Feb-2019 |
Andreas Dilger <adilger@dilger.ca> |
ext4: don't update s_rev_level if not required
Don't update the superblock s_rev_level during mount if it isn't actually necessary, only if superblock features are being set by the kernel. This was
ext4: don't update s_rev_level if not required
Don't update the superblock s_rev_level during mount if it isn't actually necessary, only if superblock features are being set by the kernel. This was originally added for ext3 since it always set the INCOMPAT_RECOVER and HAS_JOURNAL features during mount, but this is not needed since no journal mode was added to ext4.
That will allow Geert to mount his 20-year-old ext2 rev 0.0 m68k filesystem, as a testament of the backward compatibility of ext4.
Fixes: 0390131ba84f ("ext4: Allow ext4 to run without a journal") Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|