History log of /openbmc/linux/fs/ext4/ext4.h (Results 201 – 225 of 1492)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 09edf4d3 05-Nov-2019 Matthew Bobrowski <mbobrowski@mbobrowski.org>

ext4: introduce new callback for IOMAP_REPORT

As part of the ext4_iomap_begin() cleanups that precede this patch, we
also split up the IOMAP_REPORT branch into a completely separate
->iomap_begin()

ext4: introduce new callback for IOMAP_REPORT

As part of the ext4_iomap_begin() cleanups that precede this patch, we
also split up the IOMAP_REPORT branch into a completely separate
->iomap_begin() callback named ext4_iomap_begin_report(). Again, the
raionale for this change is to reduce the overall clutter within
ext4_iomap_begin().

Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/5c97a569e26ddb6696e3d3ac9fbde41317e029a0.1572949325.git.mbobrowski@mbobrowski.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


Revision tags: v5.3.7
# c8cc8816 16-Oct-2019 Ritesh Harjani <riteshh@linux.ibm.com>

ext4: Add support for blocksize < pagesize in dioread_nolock

This patch adds the support for blocksize < pagesize for
dioread_nolock feature.

Since in case of blocksize < pagesize, we can have mult

ext4: Add support for blocksize < pagesize in dioread_nolock

This patch adds the support for blocksize < pagesize for
dioread_nolock feature.

Since in case of blocksize < pagesize, we can have multiple
small buffers of page as unwritten extents, we need to
maintain a vector of these unwritten extents which needs
the conversion after the IO is complete. Thus, we maintain
a list of tuple <offset, size> pair (io_end_vec) for this &
traverse this list to do the unwritten to written conversion.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20191016073711.4141-5-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# a00713ea 16-Oct-2019 Ritesh Harjani <riteshh@linux.ibm.com>

ext4: Add API to bring in support for unwritten io_end_vec conversion

This patch just brings in the API for conversion of unwritten io_end_vec
extents which will be required for blocksize < pagesize

ext4: Add API to bring in support for unwritten io_end_vec conversion

This patch just brings in the API for conversion of unwritten io_end_vec
extents which will be required for blocksize < pagesize support
for dioread_nolock feature.

No functional changes in this patch.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/20191016073711.4141-3-riteshh@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


Revision tags: v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12
# cba465b4 04-Sep-2019 Deepa Dinamani <deepa.kernel@gmail.com>

ext4: Reduce ext4 timestamp warnings

When ext4 file systems were created intentionally with 128 byte inodes,
the rate-limited warning of eventual possible timestamp overflow are
still emitted rather

ext4: Reduce ext4 timestamp warnings

When ext4 file systems were created intentionally with 128 byte inodes,
the rate-limited warning of eventual possible timestamp overflow are
still emitted rather frequently. Remove the warning for now.

Discussion for whether any warning is needed,
and where it should be emitted, can be found at
https://lore.kernel.org/lkml/1567523922.5576.57.camel@lca.pw/.
I can post a separate follow-up patch after the conclusion.

Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

show more ...


Revision tags: v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6, v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26, v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17, v4.16, v4.15
# 4881c497 21-Jan-2018 Deepa Dinamani <deepa.kernel@gmail.com>

ext4: Initialize timestamps limits

ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.

The timestamp limits are calculated according to the
encoding

ext4: Initialize timestamps limits

ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.

The timestamp limits are calculated according to the
encoding table in
a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):

* extra msb of adjust for signed
* epoch 32-bit 32-bit tv_sec to
* bits time decoded 64-bit tv_sec 64-bit tv_sec valid time range
* 0 0 1 -0x80000000..-0x00000001 0x000000000 1901-12-13..1969-12-31
* 0 0 0 0x000000000..0x07fffffff 0x000000000 1970-01-01..2038-01-19
* 0 1 1 0x080000000..0x0ffffffff 0x100000000 2038-01-19..2106-02-07
* 0 1 0 0x100000000..0x17fffffff 0x100000000 2106-02-07..2174-02-25
* 1 0 1 0x180000000..0x1ffffffff 0x200000000 2174-02-25..2242-03-16
* 1 0 0 0x200000000..0x27fffffff 0x200000000 2242-03-16..2310-04-04
* 1 1 1 0x280000000..0x2ffffffff 0x300000000 2310-04-04..2378-04-22
* 1 1 0 0x300000000..0x37fffffff 0x300000000 2378-04-22..2446-05-10

Note that the time limits are not correct for deletion times.

Added a warn when an inode cannot be extended to incorporate an
extended timestamp.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: tytso@mit.edu
Cc: adilger.kernel@dilger.ca
Cc: linux-ext4@vger.kernel.org

show more ...


# 7727ae52 28-Aug-2019 zhangyi (F) <yi.zhang@huawei.com>

ext4: fix potential use after free after remounting with noblock_validity

Remount process will release system zone which was allocated before if
"noblock_validity" is specified. If we mount an ext4

ext4: fix potential use after free after remounting with noblock_validity

Remount process will release system zone which was allocated before if
"noblock_validity" is specified. If we mount an ext4 file system to two
mountpoints with default mount options, and then remount one of them
with "noblock_validity", it may trigger a use after free problem when
someone accessing the other one.

# mount /dev/sda foo
# mount /dev/sda bar

User access mountpoint "foo" | Remount mountpoint "bar"
|
ext4_map_blocks() | ext4_remount()
check_block_validity() | ext4_setup_system_zone()
ext4_data_block_valid() | ext4_release_system_zone()
| free system_blks rb nodes
access system_blks rb nodes |
trigger use after free |

This problem can also be reproduced by one mountpint, At the same time,
add_system_zone() can get called during remount as well so there can be
racing ext4_data_block_valid() reading the rbtree at the same time.

This patch add RCU to protect system zone from releasing or building
when doing a remount which inverse current "noblock_validity" mount
option. It assign the rbtree after the whole tree was complete and
do actual freeing after rcu grace period, avoid any intermediate state.

Reported-by: syzbot+1e470567330b7ad711d5@syzkaller.appspotmail.com
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

show more ...


# 8fcc3a58 22-Aug-2019 Eric Whitney <enwlinux@gmail.com>

ext4: rework reserved cluster accounting when invalidating pages

The goal of this patch is to remove two references to the buffer delay
bit in ext4_da_page_release_reservation() as part of a larger

ext4: rework reserved cluster accounting when invalidating pages

The goal of this patch is to remove two references to the buffer delay
bit in ext4_da_page_release_reservation() as part of a larger effort
to remove all such references from ext4. These two references are
principally used to reduce the reserved block/cluster count when pages
are invalidated as a result of truncating, punching holes, or
collapsing a block range in a file. The entire function is removed
and replaced with code in ext4_es_remove_extent() that reduces the
reserved count as a side effect of removing a block range from delayed
and not unwritten extents in the extent status tree as is done when
truncating, punching holes, or collapsing ranges.

The code is written to minimize the number of searches descending from
rb tree roots for scalability.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 7963e5ac 22-Aug-2019 ZhangXiaoxu <zhangxiaoxu5@huawei.com>

ext4: treat buffers with write errors as containing valid data

I got some errors when I repair an ext4 volume which stacked by an
iscsi target:
Entry 'test60' in / (2) has deleted/unused inode 7

ext4: treat buffers with write errors as containing valid data

I got some errors when I repair an ext4 volume which stacked by an
iscsi target:
Entry 'test60' in / (2) has deleted/unused inode 73750. Clear?
It can be reproduced when the network not good enough.

When I debug this I found ext4 will read entry buffer from disk and
the buffer is marked with write_io_error.

If the buffer is marked with write_io_error, it means it already
wroten to journal, and not checked out to disk. IOW, the journal
is newer than the data in disk.
If this journal record 'delete test60', it means the 'test60' still
on the disk metadata.

In this case, if we read the buffer from disk successfully and create
file continue, the new journal record will overwrite the journal
which record 'delete test60', then the entry corruptioned.

So, use the buffer rather than read from disk if the buffer is marked
with write_io_error.

Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 22cfe4b4 22-Jul-2019 Eric Biggers <ebiggers@google.com>

ext4: add fs-verity read support

Make ext4_mpage_readpages() verify data as it is read from fs-verity
files, using the helper functions from fs/verity/.

To support both encryption and verity simult

ext4: add fs-verity read support

Make ext4_mpage_readpages() verify data as it is read from fs-verity
files, using the helper functions from fs/verity/.

To support both encryption and verity simultaneously, this required
refactoring the decryption workflow into a generic "post-read
processing" workflow which can do decryption, verification, or both.

The case where the ext4 block size is not equal to the PAGE_SIZE is not
supported yet, since in that case ext4_mpage_readpages() sometimes falls
back to block_read_full_page(), which does not support fs-verity yet.

Co-developed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>

show more ...


# c93d8f88 22-Jul-2019 Eric Biggers <ebiggers@google.com>

ext4: add basic fs-verity support

Add most of fs-verity support to ext4. fs-verity is a filesystem
feature that enables transparent integrity protection and authentication
of read-only files. It u

ext4: add basic fs-verity support

Add most of fs-verity support to ext4. fs-verity is a filesystem
feature that enables transparent integrity protection and authentication
of read-only files. It uses a dm-verity like mechanism at the file
level: a Merkle tree is used to verify any block in the file in
log(filesize) time. It is implemented mainly by helper functions in
fs/verity/. See Documentation/filesystems/fsverity.rst for the full
documentation.

This commit adds all of ext4 fs-verity support except for the actual
data verification, including:

- Adding a filesystem feature flag and an inode flag for fs-verity.

- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.

- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.

- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().

ext4 stores the verity metadata (Merkle tree and fsverity_descriptor)
past the end of the file, starting at the first 64K boundary beyond
i_size. This approach works because (a) verity files are readonly, and
(b) pages fully beyond i_size aren't visible to userspace but can be
read/written internally by ext4 with only some relatively small changes
to ext4. This approach avoids having to depend on the EA_INODE feature
and on rearchitecturing ext4's xattr support to support paging
multi-gigabyte xattrs into memory, and to support encrypting xattrs.
Note that the verity metadata *must* be encrypted when the file is,
since it contains hashes of the plaintext data.

This patch incorporates work by Theodore Ts'o and Chandan Rajendra.

Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>

show more ...


# cd2d9922 12-Aug-2019 Theodore Ts'o <tytso@mit.edu>

ext4: drop legacy pre-1970 encoding workaround

Originally, support for expanded timestamps had a bug in that pre-1970
times were erroneously encoded as being in the the 24th century. This
was fixed

ext4: drop legacy pre-1970 encoding workaround

Originally, support for expanded timestamps had a bug in that pre-1970
times were erroneously encoded as being in the the 24th century. This
was fixed in commit a4dad1ae24f8 ("ext4: Fix handling of extended
tv_sec") which landed in 4.4. Starting with 4.4, pre-1970 timestamps
were correctly encoded, but for backwards compatibility those
incorrectly encoded timestamps were mapped back to the pre-1970 dates.

Given that backwards compatibility workaround has been around for 4
years, and given that running e2fsck from e2fsprogs 1.43.2 and later
will offer to fix these timestamps (which has been released for 3
years), it's past time to drop the legacy workaround from the kernel.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# bb5835ed 11-Aug-2019 Theodore Ts'o <tytso@mit.edu>

ext4: add new ioctl EXT4_IOC_GET_ES_CACHE

For debugging reasons, it's useful to know the contents of the extent
cache. Since the extent cache contains much of what is in the fiemap
ioctl, use an fi

ext4: add new ioctl EXT4_IOC_GET_ES_CACHE

For debugging reasons, it's useful to know the contents of the extent
cache. Since the extent cache contains much of what is in the fiemap
ioctl, use an fiemap-style interface to return this information.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 1ad3ea6e 11-Aug-2019 Theodore Ts'o <tytso@mit.edu>

ext4: add a new ioctl EXT4_IOC_GETSTATE

The new ioctl EXT4_IOC_GETSTATE returns some of the dynamic state of
an ext4 inode for debugging purposes.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# b0c013e2 11-Aug-2019 Theodore Ts'o <tytso@mit.edu>

ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE

The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent
status cache to be cleared out. This is intended for use for
debugging.

Signed-off-

ext4: add a new ioctl EXT4_IOC_CLEAR_ES_CACHE

The new ioctl EXT4_IOC_CLEAR_ES_CACHE will force an inode's extent
status cache to be cleared out. This is intended for use for
debugging.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 7633b08b 21-Jun-2019 Theodore Ts'o <tytso@mit.edu>

ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()

Clean up namespace pollution by the inline_data code.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>


# ddce3b94 21-Jun-2019 Theodore Ts'o <tytso@mit.edu>

ext4: refactor initialize_dirent_tail()

Move the calculation of the location of the dirent tail into
initialize_dirent_tail(). Also prefix the function with ext4_ to fix
kernel namepsace polution.

ext4: refactor initialize_dirent_tail()

Move the calculation of the location of the dirent tail into
initialize_dirent_tail(). Also prefix the function with ext4_ to fix
kernel namepsace polution.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# f036adb3 21-Jun-2019 Theodore Ts'o <tytso@mit.edu>

ext4: rename "dirent_csum" functions to use "dirblock"

Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set()
don't actually operate on a directory entry, but a directory block.
And

ext4: rename "dirent_csum" functions to use "dirblock"

Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set()
don't actually operate on a directory entry, but a directory block.
And while they take a struct ext4_dir_entry *dirent as an argument, it
had better be the first directory at the beginning of the direct
block, or things will go very wrong.

Rename the following functions so that things make more sense, and
remove a lot of confusing casts along the way:

ext4_dirent_csum_verify -> ext4_dirblock_csum_verify
ext4_dirent_csum_set -> ext4_dirblock_csum_set
ext4_dirent_csum -> ext4_dirblock_csum
ext4_handle_dirty_dirent_node -> ext4_handle_dirty_dirblock

Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 3ae72562 19-Jun-2019 Gabriel Krisman Bertazi <krisman@collabora.com>

ext4: optimize case-insensitive lookups

Temporarily cache a casefolded version of the file name under lookup in
ext4_filename, to avoid repeatedly casefolding it. I got up to 30%
speedup on lookups

ext4: optimize case-insensitive lookups

Temporarily cache a casefolded version of the file name under lookup in
ext4_filename, to avoid repeatedly casefolding it. I got up to 30%
speedup on lookups of large directories (>100k entries), depending on
the length of the string under lookup.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 7ddf79a1 09-Jun-2019 Wang Shilong <wshilong@ddn.com>

ext4: only set project inherit bit for directory

It doesn't make any sense to have project inherit bits
for regular files, even though this won't cause any
problem, but it is better fix this.

Signe

ext4: only set project inherit bit for directory

It doesn't make any sense to have project inherit bits
for regular files, even though this won't cause any
problem, but it is better fix this.

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>

show more ...


# b886ee3e 25-Apr-2019 Gabriel Krisman Bertazi <krisman@collabora.co.uk>

ext4: Support case-insensitive file name lookups

This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
supe

ext4: Support case-insensitive file name lookups

This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
superblock.

A filesystem that has the casefold feature set is able to configure
directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.

The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.

* dcache handling:

For a +F directory, Ext4 only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().

d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.

For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.

* on-disk data:

Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.

DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.

* Dealing with invalid sequences:

By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.

* Normalization algorithm:

The UTF-8 algorithms used to compare strings in ext4 is implemented
lives in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.

NFD seems to be the best normalization method for EXT4 because:

- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.

Although:

- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# c83ad55e 25-Apr-2019 Gabriel Krisman Bertazi <krisman@collabora.co.uk>

ext4: include charset encoding information in the superblock

Support for encoding is considered an incompatible feature, since it has
potential to create collisions of file names in existing filesys

ext4: include charset encoding information in the superblock

Support for encoding is considered an incompatible feature, since it has
potential to create collisions of file names in existing filesystems.
If the feature flag is not enabled, the entire filesystem will operate
on opaque byte sequences, respecting the original behavior.

The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.

The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# 877b5691 14-Apr-2019 Eric Biggers <ebiggers@google.com>

crypto: shash - remove shash_desc::flags

The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algori

crypto: shash - remove shash_desc::flags

The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.

With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP. These would all need to be fixed if any shash algorithm
actually started sleeping. For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API. However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.

Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk. It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.

Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

show more ...


# b01531db 20-Mar-2019 Eric Biggers <ebiggers@google.com>

fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext

->lookup() in an encrypted directory begins as follows:

1. fscrypt_prepare_lookup():
a. Try to load the directory's encry

fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext

->lookup() in an encrypted directory begins as follows:

1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.

But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.

This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.

Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.

Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.

Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


# fe53cbc5 06-Apr-2019 Eric Biggers <ebiggers@google.com>

ext4: remove incorrect comment for NEXT_ORPHAN()

The comment above NEXT_ORPHAN() was meant for ext4_encrypted_inode(),
which was moved by commit a7550b30ab70 ("ext4 crypto: migrate into vfs's
crypto

ext4: remove incorrect comment for NEXT_ORPHAN()

The comment above NEXT_ORPHAN() was meant for ext4_encrypted_inode(),
which was moved by commit a7550b30ab70 ("ext4 crypto: migrate into vfs's
crypto engine") but the comment was accidentally left in place. Since
ext4_encrypted_inode() has now been removed, just remove the comment.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>

show more ...


# c9e716eb 14-Feb-2019 Andreas Dilger <adilger@dilger.ca>

ext4: don't update s_rev_level if not required

Don't update the superblock s_rev_level during mount if it isn't
actually necessary, only if superblock features are being set by
the kernel. This was

ext4: don't update s_rev_level if not required

Don't update the superblock s_rev_level during mount if it isn't
actually necessary, only if superblock features are being set by
the kernel. This was originally added for ext3 since it always
set the INCOMPAT_RECOVER and HAS_JOURNAL features during mount,
but this is not needed since no journal mode was added to ext4.

That will allow Geert to mount his 20-year-old ext2 rev 0.0 m68k
filesystem, as a testament of the backward compatibility of ext4.

Fixes: 0390131ba84f ("ext4: Allow ext4 to run without a journal")
Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

show more ...


12345678910>>...60