#
0b9f748d |
| 12-Feb-2024 |
Stefan Hajnoczi <stefanha@redhat.com> |
virtiofs: forbid newlines in tags
[ Upstream commit 40488cc16f7ea0d193a4e248f0d809c25cc377db ]
Newlines in virtiofs tags are awkward for users and potential vectors for string injection attacks.
S
virtiofs: forbid newlines in tags
[ Upstream commit 40488cc16f7ea0d193a4e248f0d809c25cc377db ]
Newlines in virtiofs tags are awkward for users and potential vectors for string injection attacks.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
1ea7ca1b |
| 15-Jun-2023 |
Jane Chu <jane.chu@oracle.com> |
dax: enable dax fault handler to report VM_FAULT_HWPOISON
When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIG
dax: enable dax fault handler to report VM_FAULT_HWPOISON
When multiple processes mmap() a dax file, then at some point, a process issues a 'load' and consumes a hwpoison, the process receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb set for the poison scope. Soon after, any other process issues a 'load' to the poisoned page (that is unmapped from the kernel side by memory_failure), it receives a SIGBUS with si_code = BUS_ADRERR and without valid si_lsb.
This is confusing to user, and is different from page fault due to poison in RAM memory, also some helpful information is lost.
Channel dax backend driver's poison detection to the filesystem such that instead of reporting VM_FAULT_SIGBUS, it could report VM_FAULT_HWPOISON.
If user level block IO syscalls fail due to poison, the errno will be converted to EIO to maintain block API consistency.
Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
show more ...
|
#
73fb2c8b |
| 22-Jun-2022 |
Deming Wang <wangdeming@inspur.com> |
virtio_fs: Modify format for virtio_fs_direct_access
We should isolate operators with spaces.
Signed-off-by: Deming Wang <wangdeming@inspur.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
1e5b9e04 |
| 09-Jun-2022 |
Deming Wang <wangdeming@inspur.com> |
virtiofs: delete unused parameter for virtio_fs_cleanup_vqs
fs parameter not used. So, it needs to be deleted.
Signed-off-by: Deming Wang <wangdeming@inspur.com> Reviewed-by: Stefan Hajnoczi <stefa
virtiofs: delete unused parameter for virtio_fs_cleanup_vqs
fs parameter not used. So, it needs to be deleted.
Signed-off-by: Deming Wang <wangdeming@inspur.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
e511c4a3 |
| 13-May-2022 |
Jane Chu <jane.chu@oracle.com> |
dax: introduce DAX_RECOVERY_WRITE dax access mode
Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. T
dax: introduce DAX_RECOVERY_WRITE dax access mode
Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write.
Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
dc90f084 |
| 15-Feb-2022 |
Christoph Hellwig <hch@lst.de> |
mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pul
mm: don't include <linux/memremap.h> in <linux/mm.h>
Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h.
Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: "Sierra Guiza, Alejandro (Alex)" <alex.sierra@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Cc: Karol Herbst <kherbst@redhat.com> Cc: Lyude Paul <lyude@redhat.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
show more ...
|
#
d9679d00 |
| 13-Oct-2021 |
Michael S. Tsirkin <mst@redhat.com> |
virtio: wrap config->reset calls
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued
virtio: wrap config->reset calls
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
show more ...
|
#
7ac5360c |
| 15-Dec-2021 |
Christoph Hellwig <hch@lst.de> |
dax: remove the copy_from_iter and copy_to_iter methods
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain o
dax: remove the copy_from_iter and copy_to_iter methods
These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device.
Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
30c6828a |
| 15-Dec-2021 |
Christoph Hellwig <hch@lst.de> |
dax: remove the DAXDEV_F_SYNC flag
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly.
Signed-off-by: Christoph Hellwi
dax: remove the DAXDEV_F_SYNC flag
Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
780b1b95 |
| 25-Nov-2021 |
Jeffle Xu <jefflexu@linux.alibaba.com> |
fuse: make DAX mount option a tri-state
We add 'always', 'never', and 'inode' (default). '-o dax' continues to operate the same which is equivalent to 'always'.
The following behavior is consistent
fuse: make DAX mount option a tri-state
We add 'always', 'never', and 'inode' (default). '-o dax' continues to operate the same which is equivalent to 'always'.
The following behavior is consistent with that on ext4/xfs:
- The default behavior (when neither '-o dax' nor '-o dax=always|never|inode' option is specified) is equal to 'inode' mode, while 'dax=inode' won't be printed among the mount option list.
- The 'inode' mode is only advisory. It will silently fallback to 'never' mode if fuse server doesn't support that.
Also noted that by the time of this commit, 'inode' mode is actually equal to 'always' mode, before the per inode DAX flag is introduced in the following patch.
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
fb08a190 |
| 29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dax: simplify the dax_device <-> gendisk association
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that wa
dax: simplify the dax_device <-> gendisk association
Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
7c594bbd |
| 02-Nov-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
virtiofs: use strscpy for copying the queue name
Always null terminate fsvq->name.
Reported-by: kernel test robot <lkp@intel.com> Fixes: b43b7e81eb2b ("virtiofs: provide a helper function for virtq
virtiofs: use strscpy for copying the queue name
Always null terminate fsvq->name.
Reported-by: kernel test robot <lkp@intel.com> Fixes: b43b7e81eb2b ("virtiofs: provide a helper function for virtqueue initialization") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
c191cd07 |
| 21-Oct-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
fuse: clean up fuse_mount destruction
1. call fuse_mount_destroy() for open coded variants
2. before deactivate_locked_super() don't need fuse_mount destruction since that will now be done (if ->s_
fuse: clean up fuse_mount destruction
1. call fuse_mount_destroy() for open coded variants
2. before deactivate_locked_super() don't need fuse_mount destruction since that will now be done (if ->s_fs_info is not cleared)
3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the regular pattern can be used
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
a27c061a |
| 21-Oct-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
fuse: get rid of fuse_put_super()
The ->put_super callback is called from generic_shutdown_super() in case of a fully initialized sb. This is called from kill_***_super(), which is called from ->ki
fuse: get rid of fuse_put_super()
The ->put_super callback is called from generic_shutdown_super() in case of a fully initialized sb. This is called from kill_***_super(), which is called from ->kill_sb instances.
Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the reference to the fuse_conn, while it does the same on each error case during sb setup.
This patch moves the destruction from fuse_put_super() to fuse_mount_destroy(), called at the end of all ->kill_sb instances. A follup patch will clean up the error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
d534d31d |
| 21-Oct-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
fuse: check s_root when destroying sb
Checking "fm" works because currently sb->s_fs_info is cleared on error paths; however, sb->s_root is what generic_shutdown_super() checks to determine whether
fuse: check s_root when destroying sb
Checking "fm" works because currently sb->s_fs_info is cleared on error paths; however, sb->s_root is what generic_shutdown_super() checks to determine whether the sb was fully initialized or not.
This change will allow cleanup of sb setup error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
84c21507 |
| 04-Aug-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
fuse: name fs_context consistently
Naming convention under fs/fuse/:
struct fuse_conn *fc; struct fs_context *fsc;
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
fe0a7bd8 |
| 04-Jun-2021 |
Greg Kurz <groug@kaod.org> |
fuse: add dedicated filesystem context ops for submounts
The creation of a submount is open-coded in fuse_dentry_automount(). This brings a lot of complexity and we recently had to fix bugs because
fuse: add dedicated filesystem context ops for submounts
The creation of a submount is open-coded in fuse_dentry_automount(). This brings a lot of complexity and we recently had to fix bugs because we weren't setting SB_BORN or because we were unlocking sb->s_umount before sb was fully configured. Most of these could have been avoided by using the mount API instead of open-coding.
Basically, this means coming up with a proper ->get_tree() implementation for submounts and call vfs_get_tree(), or better fc_mount().
The creation of the superblock for submounts is quite different from the root mount. Especially, it doesn't require to allocate a FUSE filesystem context, nor to parse parameters.
Introduce a dedicated context ops for submounts to make this clear. This is just a placeholder for now, fuse_get_tree_submount() will be populated in a subsequent patch.
Only visible change is that we stop allocating/freeing a useless FUSE filesystem context with submounts.
Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
2d82ab25 |
| 20-May-2021 |
Greg Kurz <groug@kaod.org> |
virtiofs: propagate sync() to file server
Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same s
virtiofs: propagate sync() to file server
Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same system. This isn't happening with virtiofs though: sync() inside the guest returns right away even though data still needs to be flushed from the host page cache.
This is easily demonstrated by doing the following in the guest:
$ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s sync() = 0 <0.024068>
and start the following in the host when the 'dd' command completes in the guest:
$ strace -T -e fsync /usr/bin/sync virtiofs/foo fsync(3) = 0 <10.371640>
There are no good reasons not to honor the expected behavior of sync() actually: it gives an unrealistic impression that virtiofs is super fast and that data has safely landed on HW, which isn't the case obviously.
Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS request type for this purpose. Provision a 64-bit placeholder for possible future extensions. Since the file server cannot handle the wait == 0 case, we skip it to avoid a gratuitous roundtrip. Note that this is per-superblock: a FUSE_SYNCFS is send for the root mount and for each submount.
Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in the file server is treated as permanent success. This ensures compatibility with older file servers: the client will get the current behavior of sync() not being propagated to the file server.
Note that such an operation allows the file server to DoS sync(). Since a typical FUSE file server is an untrusted piece of software running in userspace, this is disabled by default. Only enable it with virtiofs for now since virtiofsd is supposedly trusted by the guest kernel.
Reported-by: Robert Krawitz <rlk@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
0a7419c6 |
| 14-Apr-2021 |
Miklos Szeredi <mszeredi@redhat.com> |
virtiofs: fix userns
get_user_ns() is done twice (once in virtio_fs_get_tree() and once in fuse_conn_init()), resulting in a reference leak.
Also looks better to use fsc->user_ns (which *should* be
virtiofs: fix userns
get_user_ns() is done twice (once in virtio_fs_get_tree() and once in fuse_conn_init()), resulting in a reference leak.
Also looks better to use fsc->user_ns (which *should* be the current_user_ns() at this point).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
07595bfa |
| 13-Apr-2021 |
Jiapeng Chong <jiapeng.chong@linux.alibaba.com> |
virtiofs: remove useless function
Fix the following clang warning:
fs/fuse/virtio_fs.c:130:35: warning: unused function 'vq_to_fpq' [-Wunused-function].
Reported-by: Abaci Robot <abaci@linux.aliba
virtiofs: remove useless function
Fix the following clang warning:
fs/fuse/virtio_fs.c:130:35: warning: unused function 'vq_to_fpq' [-Wunused-function].
Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
a7f0d7aa |
| 18-Mar-2021 |
Connor Kuehl <ckuehl@redhat.com> |
virtiofs: split requests that exceed virtqueue size
If an incoming FUSE request can't fit on the virtqueue, the request is placed onto a workqueue so a worker can try to resubmit it later where ther
virtiofs: split requests that exceed virtqueue size
If an incoming FUSE request can't fit on the virtqueue, the request is placed onto a workqueue so a worker can try to resubmit it later where there will (hopefully) be space for it next time.
This is fine for requests that aren't larger than a virtqueue's maximum capacity. However, if a request's size exceeds the maximum capacity of the virtqueue (even if the virtqueue is empty), it will be doomed to a life of being placed on the workqueue, removed, discovered it won't fit, and placed on the workqueue yet again.
Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect Descriptors) of the virtio spec:
"A driver MUST NOT create a descriptor chain longer than the Queue Size of the device."
To fix this, limit the number of pages FUSE will use for an overall request. This way, each request can realistically fit on the virtqueue when it is decomposed into a scattergather list and avoid violating section 2.6.5.3.1 of the virtio spec.
Signed-off-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
c79c5e01 |
| 17-Mar-2021 |
Luis Henriques <lhenriques@suse.de> |
virtiofs: fix memory leak in virtio_fs_probe()
When accidentally passing twice the same tag to qemu, kmemleak ended up reporting a memory leak in virtiofs. Also, looking at the log I saw the follow
virtiofs: fix memory leak in virtio_fs_probe()
When accidentally passing twice the same tag to qemu, kmemleak ended up reporting a memory leak in virtiofs. Also, looking at the log I saw the following error (that's when I realised the duplicated tag):
virtiofs: probe of virtio5 failed with error -17
Here's the kmemleak log for reference:
unreferenced object 0xffff888103d47800 (size 1024): comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff ................ backtrace: [<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs] [<00000000f8aca419>] virtio_dev_probe+0x15f/0x210 [<000000004d6baf3c>] really_probe+0xea/0x430 [<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0 [<00000000196f47a7>] __driver_attach+0x98/0x140 [<000000000b20601d>] bus_for_each_dev+0x7b/0xc0 [<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0 [<0000000032b09ba7>] driver_register+0x8f/0xe0 [<00000000cdd55998>] 0xffffffffa002c013 [<000000000ea196a2>] do_one_initcall+0x64/0x2e0 [<0000000008f727ce>] do_init_module+0x5c/0x260 [<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120 [<00000000ad2f48c6>] do_syscall_64+0x33/0x40 [<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: stable@vger.kernel.org Signed-off-by: Luis Henriques <lhenriques@suse.de> Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
3f9b9efd |
| 09-Feb-2021 |
Vivek Goyal <vgoyal@redhat.com> |
virtiofs: Fail dax mount if device does not support it
Right now "mount -t virtiofs -o dax myfs /mnt/virtiofs" succeeds even if filesystem deivce does not have a cache window and hence DAX can't be
virtiofs: Fail dax mount if device does not support it
Right now "mount -t virtiofs -o dax myfs /mnt/virtiofs" succeeds even if filesystem deivce does not have a cache window and hence DAX can't be supported.
This gives a false sense to user that they are using DAX with virtiofs but fact of the matter is that they are not.
Fix this by returning error if dax can't be supported and user has asked for it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
833c5a42 |
| 11-Nov-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
virtiofs: clean up error handling in virtio_fs_get_tree()
Avoid duplicating error cleanup.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
#
514b5e3f |
| 11-Nov-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
fuse: get rid of fuse_mount refcount
Fuse mount now only ever has a refcount of one (before being freed) so the count field is unnecessary.
Remove the refcounting and fold fuse_mount_put() into cal
fuse: get rid of fuse_mount refcount
Fuse mount now only ever has a refcount of one (before being freed) so the count field is unnecessary.
Remove the refcounting and fold fuse_mount_put() into callers. The only caller of fuse_mount_put() where fm->fc was NULL is fuse_dentry_automount() and here the fuse_conn_put() can simply be omitted.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|