#
4baf4a29 |
| 02-Feb-2025 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.75' into for/openbmc/dev-6.6
This is the 6.6.75 stable release
|
Revision tags: v6.6.75 |
|
#
850e696f |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Use d_children list to iterate simple_offset directories
[ Upstream commit b9b588f22a0c049a14885399e27625635ae6ef91 ]
The mtree mechanism has been effective at creating directory offsets tha
libfs: Use d_children list to iterate simple_offset directories
[ Upstream commit b9b588f22a0c049a14885399e27625635ae6ef91 ]
The mtree mechanism has been effective at creating directory offsets that are stable over multiple opendir instances. However, it has not been able to handle the subtleties of renames that are concurrent with readdir.
Instead of using the mtree to emit entries in the order of their offset values, use it only to map incoming ctx->pos to a starting entry. Then use the directory's d_children list, which is already maintained properly by the dcache, to find the next child to emit.
One of the sneaky things about this is that when the mtree-allocated offset value wraps (which is very rare), looking up ctx->pos++ is not going to find the next entry; it will return NULL. Instead, by following the d_children list, the offset values can appear in any order but all of the entries in the directory will be visited eventually.
Note also that the readdir() is guaranteed to reach the tail of this list. Entries are added only at the head of d_children, and readdir walks from its current position in that list towards its tail.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20241228175522.1854234-6-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
0f03dd06 |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Replace simple_offset end-of-directory detection
[ Upstream commit 68a3a65003145644efcbb651e91db249ccd96281 ]
According to getdents(3), the d_off field in each returned directory entry point
libfs: Replace simple_offset end-of-directory detection
[ Upstream commit 68a3a65003145644efcbb651e91db249ccd96281 ]
According to getdents(3), the d_off field in each returned directory entry points to the next entry in the directory. The d_off field in the last returned entry in the readdir buffer must contain a valid offset value, but if it points to an actual directory entry, then readdir/getdents can loop.
This patch introduces a specific fixed offset value that is placed in the d_off field of the last entry in a directory. Some user space applications assume that the EOD offset value is larger than the offsets of real directory entries, so the largest valid offset value is reserved for this purpose. This new value is never allocated by simple_offset_add().
When ->iterate_dir() returns, getdents{64} inserts the ctx->pos value into the d_off field of the last valid entry in the readdir buffer. When it hits EOD, offset_readdir() sets ctx->pos to the EOD offset value so the last entry is updated to point to the EOD marker.
When trying to read the entry at the EOD offset, offset_readdir() terminates immediately.
It is worth noting that using a Maple tree for directory offset value allocation does not guarantee a 63-bit range of values -- on platforms where "long" is a 32-bit type, the directory offset value range is still 0..(2^31 - 1). For broad compatibility with 32-bit user space, the largest tmpfs directory cookie value is now S32_MAX.
Fixes: 796432efab1e ("libfs: getdents() should return 0 after reaching EOD") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20241228175522.1854234-5-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
6b1de53b |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
Revert "libfs: Add simple_offset_empty()"
[ Upstream commit d7bde4f27ceef3dc6d72010a20d4da23db835a32 ]
simple_empty() and simple_offset_empty() perform the same task. The latter's use as a canary t
Revert "libfs: Add simple_offset_empty()"
[ Upstream commit d7bde4f27ceef3dc6d72010a20d4da23db835a32 ]
simple_empty() and simple_offset_empty() perform the same task. The latter's use as a canary to find bugs has not found any new issues. A subsequent patch will remove the use of the mtree for iterating directory contents, so revert back to using a similar mechanism for determining whether a directory is indeed empty.
Only one such mechanism is ever needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20241228175522.1854234-3-cel@kernel.org Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
a01bb1c5 |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Return ENOSPC when the directory offset range is exhausted
[ Upstream commit 903dc9c43a155e0893280c7472d4a9a3a83d75a6 ]
Testing shows that the EBUSY error return from mtree_alloc_cyclic() le
libfs: Return ENOSPC when the directory offset range is exhausted
[ Upstream commit 903dc9c43a155e0893280c7472d4a9a3a83d75a6 ]
Testing shows that the EBUSY error return from mtree_alloc_cyclic() leaks into user space. The ERRORS section of "man creat(2)" says:
> EBUSY O_EXCL was specified in flags and pathname refers > to a block device that is in use by the system > (e.g., it is mounted).
ENOSPC is closer to what applications expect in this situation.
Note that the normal range of simple directory offset values is 2..2^63, so hitting this error is going to be rare to impossible.
Fixes: 6faddda69f62 ("libfs: Add directory operations for stable offsets") Cc: stable@vger.kernel.org # v6.9+ Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Yang Erkun <yangerkun@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20241228175522.1854234-2-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
2b6da3fa |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
shmem: Fix shmem_rename2()
[ Upstream commit ad191eb6d6942bb835a0b20b647f7c53c1d99ca4 ]
When renaming onto an existing directory entry, user space expects the replacement entry to have the same dir
shmem: Fix shmem_rename2()
[ Upstream commit ad191eb6d6942bb835a0b20b647f7c53c1d99ca4 ]
When renaming onto an existing directory entry, user space expects the replacement entry to have the same directory offset as the original one.
Link: https://gitlab.alpinelinux.org/alpine/aports/-/issues/15966 Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240415152057.4605-4-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
753828d6 |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Add simple_offset_rename() API
[ Upstream commit 5a1a25be995e1014abd01600479915683e356f5c ]
I'm about to fix a tmpfs rename bug that requires the use of internal simple_offset helpers that a
libfs: Add simple_offset_rename() API
[ Upstream commit 5a1a25be995e1014abd01600479915683e356f5c ]
I'm about to fix a tmpfs rename bug that requires the use of internal simple_offset helpers that are not available in mm/shmem.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240415152057.4605-3-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
3e716f31 |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Fix simple_offset_rename_exchange()
[ Upstream commit 23cdd0eed3f1fff3af323092b0b88945a7950d8e ]
User space expects the replacement (old) directory entry to have the same directory offset af
libfs: Fix simple_offset_rename_exchange()
[ Upstream commit 23cdd0eed3f1fff3af323092b0b88945a7950d8e ]
User space expects the replacement (old) directory entry to have the same directory offset after the rename.
Suggested-by: Christian Brauner <brauner@kernel.org> Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20240415152057.4605-2-cel@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> [ cel: adjusted to apply to origin/linux-6.6.y ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
307f68e4 |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Add simple_offset_empty()
[ Upstream commit ecba88a3b32d733d41e27973e25b2bc580f64281 ]
For simple filesystems that use directory offset mapping, rely strictly on the directory offset map to
libfs: Add simple_offset_empty()
[ Upstream commit ecba88a3b32d733d41e27973e25b2bc580f64281 ]
For simple filesystems that use directory offset mapping, rely strictly on the directory offset map to tell when a directory has no children.
After this patch is applied, the emptiness test holds only the RCU read lock when the directory being tested has no children.
In addition, this adds another layer of confirmation that simple_offset_add/remove() are working as expected.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170820143463.6328.7872919188371286951.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: 5a1a25be995e ("libfs: Add simple_offset_rename() API") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
fc90bbcc |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Define a minimum directory offset
[ Upstream commit 7beea725a8ca412c6190090ce7c3a13b169592a1 ]
This value is used in several places, so make it a symbolic constant.
Reviewed-by: Jan Kara <j
libfs: Define a minimum directory offset
[ Upstream commit 7beea725a8ca412c6190090ce7c3a13b169592a1 ]
This value is used in several places, so make it a symbolic constant.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170820142741.6328.12428356024575347885.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: ecba88a3b32d ("libfs: Add simple_offset_empty()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
3bd97ebf |
| 24-Jan-2025 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: Re-arrange locking in offset_iterate_dir()
[ Upstream commit 3f6d810665dfde0d33785420618ceb03fba0619d ]
Liam and Matthew say that once the RCU read lock is released, xa_state is not safe to
libfs: Re-arrange locking in offset_iterate_dir()
[ Upstream commit 3f6d810665dfde0d33785420618ceb03fba0619d ]
Liam and Matthew say that once the RCU read lock is released, xa_state is not safe to re-use for the next xas_find() call. But the RCU read lock must be released on each loop iteration so that dput(), which might_sleep(), can be called safely.
Thus we are forced to walk the offset tree with fresh state for each directory entry. xa_find() can do this for us, though it might be a little less efficient than maintaining xa_state locally.
We believe that in the current code base, inode->i_rwsem provides protection for the xa_state maintained in offset_iterate_dir(). However, there is no guarantee that will continue to be the case in the future.
Since offset_iterate_dir() doesn't build xa_state locally any more, there's no longer a strong need for offset_find_next(). Clean up by rolling these two helpers together.
Suggested-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Message-ID: <170785993027.11135.8830043889278631735.stgit@91.116.238.104.host.secureserver.net> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170820142021.6328.15047865406275957018.stgit@91.116.238.104.host.secureserver.net Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.74, v6.6.73, v6.6.72, v6.6.71, v6.12.9, v6.6.70, v6.12.8, v6.6.69, v6.12.7, v6.6.68, v6.12.6, v6.6.67, v6.12.5, v6.6.66, v6.6.65, v6.12.4, v6.6.64, v6.12.3, v6.12.2, v6.6.63, v6.12.1, v6.12, v6.6.62, v6.6.61, v6.6.60, v6.6.59, v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54, v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45, v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33, v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23 |
|
#
3386ee82 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.10' into dev-6.6
This is the 6.6.10 stable release
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6 |
|
#
5b5599a7 |
| 04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
fs: new accessor methods for atime and mtime
[ Upstream commit 077c212f0344ae4198b2b51af128a94b614ccdf4 ]
Recently, we converted the ctime accesses in the kernel to use new accessor functions. Linu
fs: new accessor methods for atime and mtime
[ Upstream commit 077c212f0344ae4198b2b51af128a94b614ccdf4 ]
Recently, we converted the ctime accesses in the kernel to use new accessor functions. Linus recently pointed out though that if we add accessors for the atime and mtime, then that would allow us to seamlessly change how these timestamps are stored in the inode.
Add new accessor functions for the atime and mtime that mirror the accessors for the ctime.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185239.80830-1-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Stable-dep-of: 01fe654f78fd ("fs: cifs: Fix atime update check") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
b97d6790 |
| 13-Dec-2023 |
Joel Stanley <joel@jms.id.au> |
Merge tag 'v6.6.6' into dev-6.6
This is the 6.6.6 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
#
53da7fe8 |
| 19-Nov-2023 |
Chuck Lever <chuck.lever@oracle.com> |
libfs: getdents() should return 0 after reaching EOD
[ Upstream commit 796432efab1e372d404e7a71cc6891a53f105051 ]
The new directory offset helpers don't conform with the convention of getdents() re
libfs: getdents() should return 0 after reaching EOD
[ Upstream commit 796432efab1e372d404e7a71cc6891a53f105051 ]
The new directory offset helpers don't conform with the convention of getdents() returning no more entries once a directory file descriptor has reached the current end-of-directory.
To address this, copy the logic from dcache_readdir() to mark the open directory file descriptor once EOD has been reached. Seeking resets the mark.
Reported-by: Tavian Barnes <tavianator@tavianator.com> Closes: https://lore.kernel.org/linux-fsdevel/20231113180616.2831430-1-tavianator@tavianator.com/ Fixes: 6faddda69f62 ("libfs: Add directory operations for stable offsets") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/170043792492.4628.15646203084646716134.stgit@bazille.1015granger.net Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
84422aee |
| 26-Sep-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner: "This contains the usual miscellaneous fixes and cleanups for vfs and
Merge tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner: "This contains the usual miscellaneous fixes and cleanups for vfs and individual fses:
Fixes: - Revert ki_pos on error from buffered writes for direct io fallback - Add missing documentation for block device and superblock handling for changes merged this cycle - Fix reiserfs flexible array usage - Ensure that overlayfs sets ctime when setting mtime and atime - Disable deferred caller completions with overlayfs writes until proper support exists
Cleanups: - Remove duplicate initialization in pipe code - Annotate aio kioctx_table with __counted_by"
* tag 'v6.6-rc4.vfs.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: overlayfs: set ctime when setting mtime and atime ntfs3: put resources during ntfs_fill_super() ovl: disable IOCB_DIO_CALLER_COMP porting: document superblock as block device holder porting: document new block device opening order fs/pipe: remove duplicate "offset" initializer fs-writeback: do not requeue a clean inode having skipped pages aio: Annotate struct kioctx_table with __counted_by direct_write_fallback(): on error revert the ->ki_pos update from buffered write reiserfs: Replace 1-element array with C99 style flex-array
show more ...
|
Revision tags: v6.5.5, v6.5.4 |
|
#
8287474a |
| 13-Sep-2023 |
Al Viro <viro@zeniv.linux.org.uk> |
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
If we fail filemap_write_and_wait_range() on the range the buffered write went into, we only report the "number of by
direct_write_fallback(): on error revert the ->ki_pos update from buffered write
If we fail filemap_write_and_wait_range() on the range the buffered write went into, we only report the "number of bytes which we direct-written", to quote the comment in there. Which is fine, but buffered write has already advanced iocb->ki_pos, so we need to roll that back. Otherwise we end up with e.g. write(2) advancing position by more than the amount it reports having written.
Fixes: 182c25e9c157 "filemap: update ki_pos in generic_perform_write" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Message-Id: <20230827214518.GU3390869@ZenIV> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
Revision tags: v6.5.3 |
|
#
c900529f |
| 12-Sep-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Forwarding to v6.6-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.5.2, v6.1.51, v6.5.1 |
|
#
3ef96fcf |
| 31-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "Many ext4 and jbd2 cleanups and bug fixes:
- Cleanups in the ext
Merge tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o: "Many ext4 and jbd2 cleanups and bug fixes:
- Cleanups in the ext4 remount code when going to and from read-only
- Cleanups in ext4's multiblock allocator
- Cleanups in the jbd2 setup/mounting code paths
- Performance improvements when appending to a delayed allocation file
- Miscellaneous syzbot and other bug fixes"
* tag 'ext4_for_linus-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: fix slab-use-after-free in ext4_es_insert_extent() libfs: remove redundant checks of s_encoding ext4: remove redundant checks of s_encoding ext4: reject casefold inode flag without casefold feature ext4: use LIST_HEAD() to initialize the list_head in mballoc.c ext4: do not mark inode dirty every time when appending using delalloc ext4: rename s_error_work to s_sb_upd_work ext4: add periodic superblock update check ext4: drop dio overwrite only flag and associated warning ext4: add correct group descriptors and reserved GDT blocks to system zone ext4: remove unused function declaration ext4: mballoc: avoid garbage value from err ext4: use sbi instead of EXT4_SB(sb) in ext4_mb_new_blocks_simple() ext4: change the type of blocksize in ext4_mb_init_cache() ext4: fix unttached inode after power cut with orphan file feature enabled jbd2: correct the end of the journal recovery scan range ext4: ext4_get_{dev}_journal return proper error value ext4: cleanup ext4_get_dev_journal() and ext4_get_journal() jbd2: jbd2_journal_init_{dev,inode} return proper error return value jbd2: drop useless error tag in jbd2_journal_wipe() ...
show more ...
|
#
1ac731c5 |
| 30-Aug-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.6 merge window.
|
Revision tags: v6.1.50 |
|
#
de16588a |
| 28-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes
Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner: "This contains the usual miscellaneous features, cleanups, and fixes for vfs and individual filesystems.
Features:
- Block mode changes on symlinks and rectify our broken semantics
- Report file modifications via fsnotify() for splice
- Allow specifying an explicit timeout for the "rootwait" kernel command line option. This allows to timeout and reboot instead of always waiting indefinitely for the root device to show up
- Use synchronous fput for the close system call
Cleanups:
- Get rid of open-coded lockdep workarounds for async io submitters and replace it all with a single consolidated helper
- Simplify epoll allocation helper
- Convert simple_write_begin and simple_write_end to use a folio
- Convert page_cache_pipe_buf_confirm() to use a folio
- Simplify __range_close to avoid pointless locking
- Disable per-cpu buffer head cache for isolated cpus
- Port ecryptfs to kmap_local_page() api
- Remove redundant initialization of pointer buf in pipe code
- Unexport the d_genocide() function which is only used within core vfs
- Replace printk(KERN_ERR) and WARN_ON() with WARN()
Fixes:
- Fix various kernel-doc issues
- Fix refcount underflow for eventfds when used as EFD_SEMAPHORE
- Fix a mainly theoretical issue in devpts
- Check the return value of __getblk() in reiserfs
- Fix a racy assert in i_readcount_dec
- Fix integer conversion issues in various functions
- Fix LSM security context handling during automounts that prevented NFS superblock sharing"
* tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits) cachefiles: use kiocb_{start,end}_write() helpers ovl: use kiocb_{start,end}_write() helpers aio: use kiocb_{start,end}_write() helpers io_uring: use kiocb_{start,end}_write() helpers fs: create kiocb_{start,end}_write() helpers fs: add kerneldoc to file_{start,end}_write() helpers io_uring: rename kiocb_end_write() local helper splice: Convert page_cache_pipe_buf_confirm() to use a folio libfs: Convert simple_write_begin and simple_write_end to use a folio fs/dcache: Replace printk and WARN_ON by WARN fs/pipe: remove redundant initialization of pointer buf fs: Fix kernel-doc warnings devpts: Fix kernel-doc warnings doc: idmappings: fix an error and rephrase a paragraph init: Add support for rootwait timeout parameter vfs: fix up the assert in i_readcount_dec fs: Fix one kernel-doc comment docs: filesystems: idmappings: clarify from where idmappings are taken fs/buffer.c: disable per-CPU buffer_head cache for isolated CPUs vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing ...
show more ...
|
#
ecd7db20 |
| 28-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull libfs and tmpfs updates from Christian Brauner: "This cycle saw a lot of work for tmpfs that required change
Merge tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull libfs and tmpfs updates from Christian Brauner: "This cycle saw a lot of work for tmpfs that required changes to the vfs layer. Andrew, Hugh, and I decided to take tmpfs through vfs this cycle. Things will go back to mm next cycle.
Features ========
- By far the biggest work is the quota support for tmpfs. New tmpfs quota infrastructure is added to support it and a new QFMT_SHMEM uapi option is exposed.
This offers user and group quotas to tmpfs (project quotas will be added later). Similar to other filesystems tmpfs quota are not supported within user namespaces yet.
- Add support for user xattrs. While tmpfs already supports security xattrs (security.*) and POSIX ACLs for a long time it lacked support for user xattrs (user.*). With this pull request tmpfs will be able to support a limited number of user xattrs.
This is accompanied by a fix (see below) to limit persistent simple xattr allocations.
- Add support for stable directory offsets. Currently tmpfs relies on the libfs provided cursor-based mechanism for readdir. This causes issues when a tmpfs filesystem is exported via NFS.
NFS clients do not open directories. Instead, each server-side readdir operation opens the directory, reads it, and then closes it. Since the cursor state for that directory is associated with the opened file it is discarded after each readdir operation. Such directory offsets are not just cached by NFS clients but also various userspace libraries based on these clients.
As it stands there is no way to invalidate the caches when directory offsets have changed and the whole application depends on unchanging directory offsets.
At LSFMM we discussed how to solve this problem and decided to support stable directory offsets. libfs now allows filesystems like tmpfs to use an xarrary to map a directory offset to a dentry. This mechanism is currently only used by tmpfs but can be supported by others as well.
Fixes =====
- Change persistent simple xattrs allocations in libfs from GFP_KERNEL to GPF_KERNEL_ACCOUNT so they're subject to memory cgroup limits. Since this is a change to libfs it affects both tmpfs and kernfs.
- Correctly verify {g,u}id mount options.
A new filesystem context is created via fsopen() which records the namespace that becomes the owning namespace of the superblock when fsconfig(FSCONFIG_CMD_CREATE) is called for filesystems that are mountable in namespaces. However, fsconfig() calls can occur in a namespace different from the namespace where fsopen() has been called.
Currently, when fsconfig() is called to set {g,u}id mount options the requested {g,u}id is mapped into a k{g,u}id according to the namespace where fsconfig() was called from. The resulting k{g,u}id is not guaranteed to be resolvable in the namespace of the filesystem (the one that fsopen() was called in).
This means it's possible for an unprivileged user to create files owned by any group in a tmpfs mount since it's possible to set the setid bits on the tmpfs directory.
The contract for {g,u}id mount options and {g,u}id values in general set from userspace has always been that they are translated according to the caller's idmapping. In so far, tmpfs has been doing the correct thing. But since tmpfs is mountable in unprivileged contexts it is also necessary to verify that the resulting {k,g}uid is representable in the namespace of the superblock to avoid such bugs.
The new mount api's cross-namespace delegation abilities are already widely used. Having talked to a bunch of userspace this is the most faithful solution with minimal regression risks"
* tag 'v6.6-vfs.tmpfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs mm: invalidation check mapping before folio_contains tmpfs: trivial support for direct IO tmpfs,xattr: enable limited user extended attributes tmpfs: track free_ispace instead of free_inodes xattr: simple_xattr_set() return old_xattr to be freed tmpfs: verify {g,u}id mount options correctly shmem: move spinlock into shmem_recalc_inode() to fix quota support libfs: Remove parent dentry locking in offset_iterate_dir() libfs: Add a lock class for the offset map's xa_lock shmem: stable directory offsets shmem: Refactor shmem_symlink() libfs: Add directory operations for stable offsets shmem: fix quota lock nesting in huge hole handling shmem: Add default quota limit mount options shmem: quota support shmem: prepare shmem quota infrastructure quota: Check presence of quota operation structures instead of ->quota_read and ->quota_write callbacks shmem: make shmem_get_inode() return ERR_PTR instead of NULL shmem: make shmem_inode_acct_block() return error
show more ...
|
#
615e9583 |
| 28-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts
Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems.
The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes.
Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache.
Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications).
If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates.
This introduces fine-grained timestamps that are used when they are actively queried.
This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one.
As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used.
Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps.
Various preparatory changes, fixes and cleanups are included:
- Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks.
- Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors.
- Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in.
- Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers.
- Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding"
* tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
show more ...
|
Revision tags: v6.5, v6.1.49, v6.1.48, v6.1.46 |
|
#
af494af3 |
| 14-Aug-2023 |
Eric Biggers <ebiggers@google.com> |
libfs: remove redundant checks of s_encoding
Now that neither ext4 nor f2fs allows inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support
libfs: remove redundant checks of s_encoding
Now that neither ext4 nor f2fs allows inodes with the casefold flag to be instantiated when unsupported, it's unnecessary to repeatedly check for support later on during random filesystem operations.
Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
show more ...
|
#
5522d9f7 |
| 21-Aug-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
libfs: Convert simple_write_begin and simple_write_end to use a folio
Remove a number of implicit calls to compound_head() and various calls to compatibility functions. This is not sufficient to en
libfs: Convert simple_write_begin and simple_write_end to use a folio
Remove a number of implicit calls to compound_head() and various calls to compatibility functions. This is not sufficient to enable support for large folios; generic_perform_write() must be converted first.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230821141322.2535459-1-willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|