Revision tags: v6.0 |
|
#
305a72ef |
| 01-Oct-2022 |
Dan Williams <dan.j.williams@intel.com> |
Merge branch 'for-6.1/nvdimm' into libnvdimm-for-next
Add v6.1 content on top of some straggling updates that missed v6.0.
|
Revision tags: v5.15.71 |
|
#
70d1b1a7 |
| 27-Sep-2022 |
Leon Romanovsky <leonro@nvidia.com> |
Merge branch 'mlx5-vfio' into mlx5-next
Merge net/mlx5 dependencies for device DMA logging.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
b3bbcc5d |
| 24-Sep-2022 |
Dan Williams <dan.j.williams@intel.com> |
Merge branch 'for-6.0/dax' into libnvdimm-fixes
Pick up another "Soft Reservation" fix for v6.0-final on top of some straggling nvdimm fixes that missed v5.19.
|
Revision tags: v5.15.70 |
|
#
74656d03 |
| 21-Sep-2022 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.0-rc6' into locking/core, to refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v5.15.69, v5.15.68 |
|
#
a108772d |
| 14-Sep-2022 |
Maxime Ripard <maxime@cerno.tech> |
Merge drm/drm-next into drm-misc-next
We need 6.0-rc1 to merge the backlight rework PR.
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
|
Revision tags: v5.15.67, v5.15.66 |
|
#
2a906db2 |
| 06-Sep-2022 |
Tony Lindgren <tony@atomide.com> |
Merge branch 'am5748-fix' into fixes
|
Revision tags: v5.15.65 |
|
#
10438976 |
| 02-Sep-2022 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into x86/mm, to refresh the branch
This branch is ~14k commits behind upstream, and has an old merge base from early into the merge window, refresh it to v6.0-rc3+fixes before q
Merge branch 'linus' into x86/mm, to refresh the branch
This branch is ~14k commits behind upstream, and has an old merge base from early into the merge window, refresh it to v6.0-rc3+fixes before queueing up new commits.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v5.15.64 |
|
#
53aa930d |
| 30-Aug-2022 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'sched/warnings' into sched/core, to pick up WARN_ON_ONCE() conversion commit
Merge in the BUG_ON() => WARN_ON_ONCE() conversion commit.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
917bda9a |
| 29-Aug-2022 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync drm-intel-next with v6.0-rc as well as recent drm-intel-gt-next.
Since drm-next does not have commit f0c70d41e4e8 ("drm/i915/guc: remove runtime info pri
Merge drm/drm-next into drm-intel-next
Sync drm-intel-next with v6.0-rc as well as recent drm-intel-gt-next.
Since drm-next does not have commit f0c70d41e4e8 ("drm/i915/guc: remove runtime info printing from time stamp logging") yet, only drm-intel-gt-next, will need to do that as part of the merge here to build.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
show more ...
|
Revision tags: v5.15.63, v5.15.62 |
|
#
93fbff11 |
| 17-Aug-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'i2c/make_remove_callback_void-immutable' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into next
Sync up with the latest I2C code base to get updated prototype of I2C bus
Merge branch 'i2c/make_remove_callback_void-immutable' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux into next
Sync up with the latest I2C code base to get updated prototype of I2C bus remove() method.
show more ...
|
Revision tags: v5.15.61 |
|
#
cf36ae3e |
| 17-Aug-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging for v6.0-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v5.15.60, v5.15.59 |
|
#
aad26f55 |
| 02-Aug-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'docs-6.0' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet: "This was a moderately busy cycle for documentation, but nothing all that earth-shaking:
- Mor
Merge tag 'docs-6.0' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet: "This was a moderately busy cycle for documentation, but nothing all that earth-shaking:
- More Chinese translations, and an update to the Italian translations.
The Japanese, Korean, and traditional Chinese translations are more-or-less unmaintained at this point, instead.
- Some build-system performance improvements.
- The removal of the archaic submitting-drivers.rst document, with the movement of what useful material that remained into other docs.
- Improvements to sphinx-pre-install to, hopefully, give more useful suggestions.
- A number of build-warning fixes
Plus the usual collection of typo fixes, updates, and more"
* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits) docs: efi-stub: Fix paths for x86 / arm stubs Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8 Docs/zh_CN: Update the translation of pci to 5.19-rc8 Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8 Docs/zh_CN: Update the translation of usage to 5.19-rc8 Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8 Docs/zh_CN: Update the translation of sparse to 5.19-rc8 Docs/zh_CN: Update the translation of kasan to 5.19-rc8 Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8 doc:it_IT: align Italian documentation docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst Documentation: process: Update email client instructions for Thunderbird docs: ABI: correct QEMU fw_cfg spec path doc/zh_CN: remove submitting-driver reference from docs docs: zh_TW: align to submitting-drivers removal docs: zh_CN: align to submitting-drivers removal docs: ko_KR: howto: remove reference to removed submitting-drivers docs: ja_JP: howto: remove reference to removed submitting-drivers docs: it_IT: align to submitting-drivers removal docs: process: remove outdated submitting-drivers.rst ...
show more ...
|
#
0fac198d |
| 01-Aug-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull acl updates from Christian Brauner: "Last cycle we introduced support for mounting over
Merge tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull acl updates from Christian Brauner: "Last cycle we introduced support for mounting overlayfs on top of idmapped mounts. While looking into additional testing we realized that posix acls don't really work correctly with stacking filesystems on top of idmapped layers.
We already knew what the fix were but it would require work that is more suitable for the merge window so we turned off posix acls for v5.19 for overlayfs on top of idmapped layers with Miklos routing my patch upstream in 72a8e05d4f66 ("Merge tag 'ovl-fixes-5.19-rc7' [..]").
This contains the work to support posix acls for overlayfs on top of idmapped layers. Since the posix acl fixes should use the new vfs{g,u}id_t work the associated branch has been merged in. (We sent a pull request for this earlier.)
We've also pulled in Miklos pull request containing my patch to turn of posix acls on top of idmapped layers. This allowed us to avoid rebasing the branch which we didn't like because we were already at rc7 by then. Merging it in allows this branch to first fix posix acls and then to cleanly revert the temporary fix it brought in by commit 4a47c6385bb4 ("ovl: turn of SB_POSIXACL with idmapped layers temporarily").
The last patch in this series adds Seth Forshee as a co-maintainer for idmapped mounts. Seth has been integral to all of this work and is also the main architect behind the filesystem idmapping work which ultimately made filesystems such as FUSE and overlayfs available in containers. He continues to be active in both development and review. I'm very happy he decided to help and he has my full trust. This increases the bus factor which is always great for work like this. I'm honestly very excited about this because I think in general we don't do great in the bringing on new maintainers department"
For more explanations of the ACL issues, see
https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/
* tag 'fs.idmapped.overlay.acl.v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: Add Seth Forshee as co-maintainer for idmapped mounts Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily" ovl: handle idmappings in ovl_get_acl() acl: make posix_acl_clone() available to overlayfs acl: port to vfs{g,u}id_t acl: move idmapped mount fixup into vfs_{g,s}etxattr() mnt_idmapping: add vfs[g,u]id_into_k[g,u]id()
show more ...
|
Revision tags: v5.19 |
|
#
82b6c2e7 |
| 29-Jul-2022 |
Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
Merge branches 'pm-cpufreq' and 'pm-cpuidle'
Merge processor power management changes for v5.20-rc1:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplu
Merge branches 'pm-cpufreq' and 'pm-cpuidle'
Merge processor power management changes for v5.20-rc1:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq policy which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on Sapphire Rapids (Artem Bityutskiy).
* pm-cpufreq: cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask cpufreq: loongson2: fix Kconfig "its" grammar cpufreq: Warn users while freeing active policy cpufreq: ACPI: Add Zhaoxin/Centaur turbo boost control interface support cpufreq: Drop unnecessary cpus locking from store() cpufreq: Optimize cpufreq_show_cpus()
* pm-cpuidle: intel_idle: make SPR C1 and C1E be independent cpuidle: haltpoll: Add trace points for guest_halt_poll_ns grow/shrink
show more ...
|
Revision tags: v5.15.58 |
|
#
3c69a99b |
| 24-Jul-2022 |
Michael Ellerman <mpe@ellerman.id.au> |
Merge tag 'v5.19-rc7' into fixes
Merge v5.19-rc7 into fixes to bring in: d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
|
#
4effe18f |
| 24-Jul-2022 |
Jens Axboe <axboe@kernel.dk> |
Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_m
Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-send
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v5.15.57, v5.15.56 |
|
#
dc14036f |
| 18-Jul-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.19-rc7 into usb-next
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
0698461a |
| 18-Jul-2022 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf/core
To update the perf/core codebase.
Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end of evsel__config(), after what
Merge remote-tracking branch 'torvalds/master' into perf/core
To update the perf/core codebase.
Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end of evsel__config(), after what was added in:
49c692b7dfc9b6c0 ("perf offcpu: Accept allowed sample types only")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
Revision tags: v5.15.55 |
|
#
7c4d37c2 |
| 13-Jul-2022 |
Christian Brauner <brauner@kernel.org> |
Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily"
This reverts commit 4a47c6385bb4e0786826e75bd4555aba32953653.
Now that we have a proper fix for POSIX ACLs with overlayfs on top o
Revert "ovl: turn of SB_POSIXACL with idmapped layers temporarily"
This reverts commit 4a47c6385bb4e0786826e75bd4555aba32953653.
Now that we have a proper fix for POSIX ACLs with overlayfs on top of idmapped layers revert the temporary fix.
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
show more ...
|
#
45598fd4 |
| 13-Jul-2022 |
Christian Brauner <brauner@kernel.org> |
Merge tag 'ovl-fixes-5.19-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into fs.idmapped.overlay.acl
Bring in Miklos' tree which contains the temporary fix for POSIX ACLs w
Merge tag 'ovl-fixes-5.19-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into fs.idmapped.overlay.acl
Bring in Miklos' tree which contains the temporary fix for POSIX ACLs with overlayfs on top of idmapped layers. We will add a proper fix on top of it and then revert the temporary fix.
Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
show more ...
|
#
816cd168 |
| 14-Jul-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096""
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/
net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers")
net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
72a8e05d |
| 12-Jul-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fix from Miklos Szeredi: "Add a temporary fix for posix acls on idmapped mounts introduce
Merge tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fix from Miklos Szeredi: "Add a temporary fix for posix acls on idmapped mounts introduced in this cycle. A proper fix will be added in the next cycle"
* tag 'ovl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: turn off SB_POSIXACL with idmapped layers temporarily
show more ...
|
Revision tags: v5.15.54, v5.15.53 |
|
#
4a47c638 |
| 06-Jul-2022 |
Christian Brauner <brauner@kernel.org> |
ovl: turn of SB_POSIXACL with idmapped layers temporarily
This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases whe
ovl: turn of SB_POSIXACL with idmapped layers temporarily
This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases when trying to add additional tests and I noticed that reporting for POSIX ACLs is currently wrong when using idmapped layers with overlayfs mounted on top of it.
I have sent out an patch that fixes this and makes POSIX ACLs work correctly but the patch is a bit bigger and we're already at -rc5 so I recommend we simply don't raise SB_POSIXACL when idmapped layers are used. Then we can fix the VFS part described below for the next merge window so we can have good exposure in -next.
I'm going to give a rather detailed explanation to both the origin of the problem and mention the solution so people know what's going on.
Let's assume the user creates the following directory layout and they have a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would expect files on your host system to be owned. For example, ~/.bashrc for your regular user would be owned by 1000:1000 and /root/.bashrc would be owned by 0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs filesystem.
The user chooses to set POSIX ACLs using the setfacl binary granting the user with uid 4 read, write, and execute permissions for their .bashrc file:
setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
Now they to expose the whole rootfs to a container using an idmapped mount. So they first create:
mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap} mkdir -pv /vol/contpool/ctrover/{over,work} chown 10000000:10000000 /vol/contpool/ctrover/{over,work}
The user now creates an idmapped mount for the rootfs:
mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \ /var/lib/lxc/c2/rootfs \ /vol/contpool/lowermap
This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at /vol/contpool/lowermap/home/ubuntu/.bashrc.
Assume the user wants to expose these idmapped mounts through an overlayfs mount to a container.
mount -t overlay overlay \ -o lowerdir=/vol/contpool/lowermap, \ upperdir=/vol/contpool/overmap/over, \ workdir=/vol/contpool/overmap/work \ /vol/contpool/merge
The user can do this in two ways:
(1) Mount overlayfs in the initial user namespace and expose it to the container.
(2) Mount overlayfs on top of the idmapped mounts inside of the container's user namespace.
Let's assume the user chooses the (1) option and mounts overlayfs on the host and then changes into a container which uses the idmapping 0:10000000:65536 which is the same used for the two idmapped mounts.
Now the user tries to retrieve the POSIX ACLs using the getfacl command
getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc
and to their surprise they see:
# file: vol/contpool/merge/home/ubuntu/.bashrc # owner: 1000 # group: 1000 user::rw- user:4294967295:rwx group::r-- mask::rwx other::r--
indicating the uid wasn't correctly translated according to the idmapped mount. The problem is how we currently translate POSIX ACLs. Let's inspect the callchain in this example:
idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); }
If the user chooses to use option (2) and mounts overlayfs on top of idmapped mounts inside the container things don't look that much better:
idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); }
As is easily seen the problem arises because the idmapping of the lower mount isn't taken into account as all of this happens in do_gexattr(). But do_getxattr() is always called on an overlayfs mount and inode and thus cannot possible take the idmapping of the lower layers into account.
This problem is similar for fscaps but there the translation happens as part of vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain:
setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc
The expected outcome here is that we'll receive the cap_net_raw capability as we are able to map the uid associated with the fscap to 0 within our container. IOW, we want to see 0 as the result of the idmapping translations.
If the user chooses option (1) we get the following callchain for fscaps:
idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() ________________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | -> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */
And if the user chooses option (2) we get:
idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() _______________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | |-> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */
We can see how the translation happens correctly in those cases as the conversion happens within the vfs_getxattr() helper.
For POSIX ACLs we need to do something similar. However, in contrast to fscaps we cannot apply the fix directly to the kernel internal posix acl data structure as this would alter the cached values and would also require a rework of how we currently deal with POSIX ACLs in general which almost never take the filesystem idmapping into account (the noteable exception being FUSE but even there the implementation is special) and instead retrieve the raw values based on the initial idmapping.
The correct values are then generated right before returning to userspace. The fix for this is to move taking the mount's idmapping into account directly in vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user().
To this end we simply move the idmapped mount translation into a separate step performed in vfs_{g,s}etxattr() instead of in posix_acl_fix_xattr_{from,to}_user().
To see how this fixes things let's go back to the original example. Assume the user chose option (1) and mounted overlayfs on top of idmapped mounts on the host:
idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */
sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { | | V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(&init_user_ns, 10000004); | } |_________________________________________________ | | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004); }
And similarly if the user chooses option (1) and mounted overayfs on top of idmapped mounts inside the container:
idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536
sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(0(&init_user_ns, 10000004); | |_________________________________________________ | } | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004); }
The last remaining problem we need to fix here is ovl_get_acl(). During ovl_permission() overlayfs will call:
ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() > get_acl() /* on the underlying filesystem) ->inode->i_op->get_acl() == /*lower filesystem callback */ -> posix_acl_permission()
passing through the get_acl request to the underlying filesystem. This will retrieve the acls stored in the lower filesystem without taking the idmapping of the underlying mount into account as this would mean altering the cached values for the lower filesystem. The simple solution is to have ovl_get_acl() simply duplicate the ACLs, update the values according to the idmapped mount and return it to acl_permission_check() so it can be used in posix_acl_permission(). Since overlayfs doesn't cache ACLs they'll be released right after.
Link: https://github.com/brauner/mount-idmapped/issues/9 Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Amir Goldstein <amir73il@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: linux-unionfs@vger.kernel.org Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Fixes: bc70682a497c ("ovl: support idmapped layers") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
Revision tags: v5.15.52, v5.15.51, v5.15.50 |
|
#
df672565 |
| 23-Jun-2022 |
Deming Wang <wangdeming@inspur.com> |
docs: Remove duplicate word
Delete duplicate words of "the".
Signed-off-by: Deming Wang <wangdeming@inspur.com> Link: https://lore.kernel.org/r/20220624014605.2007-1-wangdeming@inspur.com Signed-of
docs: Remove duplicate word
Delete duplicate words of "the".
Signed-off-by: Deming Wang <wangdeming@inspur.com> Link: https://lore.kernel.org/r/20220624014605.2007-1-wangdeming@inspur.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
show more ...
|
Revision tags: v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15 |
|
#
762f99f4 |
| 15-Jan-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 5.17 merge window.
|