Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39 |
|
#
e9d7d3cb |
| 05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
procfs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
procfs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-65-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
Revision tags: v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30 |
|
#
b0072734 |
| 22-May-2023 |
David Howells <dhowells@redhat.com> |
tty, proc, kernfs, random: Use copy_splice_read()
Use copy_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into
tty, proc, kernfs, random: Use copy_splice_read()
Use copy_splice_read() for tty, procfs, kernfs and random files rather than going through generic_file_splice_read() as they just copy the file into the output buffer and don't splice pages. This avoids the need for them to have a ->read_folio() to satisfy filemap_splice_read().
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: Christoph Hellwig <hch@lst.de> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: John Hubbard <jhubbard@nvidia.com> cc: David Hildenbrand <david@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: Miklos Szeredi <miklos@szeredi.hu> cc: Arnd Bergmann <arnd@arndb.de> cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
Revision tags: v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61 |
|
#
3f61631d |
| 14-Aug-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
take care to handle NULL ->proc_lseek()
Easily done now, just by clearing FMODE_LSEEK in ->f_mode during proc_reg_open() for such entries.
Fixes: 868941b14441 "fs: remove no_llseek" Signed-off-by:
take care to handle NULL ->proc_lseek()
Easily done now, just by clearing FMODE_LSEEK in ->f_mode during proc_reg_open() for such entries.
Fixes: 868941b14441 "fs: remove no_llseek" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
Revision tags: v5.15.60, v5.15.59, v5.19, v5.15.58 |
|
#
ed8fb78d |
| 23-Jul-2022 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: add some (hopefully) insightful comments
* /proc/${pid}/net status * removing PDE vs last close stuff (again!) * random small stuff
Link: https://lkml.kernel.org/r/YtwrM6sDC0OQ53YB@localhost.
proc: add some (hopefully) insightful comments
* /proc/${pid}/net status * removing PDE vs last close stuff (again!) * random small stuff
Link: https://lkml.kernel.org/r/YtwrM6sDC0OQ53YB@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48 |
|
#
376b0c26 |
| 15-Jun-2022 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: delete unused <linux/uaccess.h> includes
Those aren't necessary after seq files won.
Link: https://lkml.kernel.org/r/YqnA3mS7KBt8Z4If@localhost.localdomain Signed-off-by: Alexey Dobriyan <ado
proc: delete unused <linux/uaccess.h> includes
Those aren't necessary after seq files won.
Link: https://lkml.kernel.org/r/YqnA3mS7KBt8Z4If@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
Revision tags: v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31 |
|
#
fd60b288 |
| 22-Mar-2022 |
Muchun Song <songmuchun@bytedance.com> |
fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kerne
fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb().
Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17 |
|
#
6dfbbae1 |
| 22-Jan-2022 |
Muchun Song <songmuchun@bytedance.com> |
fs: proc: store PDE()->data into inode->i_private
PDE_DATA(inode) is introduced to get user private data and hide the layout of struct proc_dir_entry. The inode->i_private is used to do the same th
fs: proc: store PDE()->data into inode->i_private
PDE_DATA(inode) is introduced to get user private data and hide the layout of struct proc_dir_entry. The inode->i_private is used to do the same thing as well. Save a copy of user private data to inode-> i_private when proc inode is allocated. This means the user also can get their private data by inode->i_private.
Introduce pde_data() to wrap inode->i_private so that we can remove PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of pde_data(). It will be easier if we decide to remove PDE_DATE() in the future.
Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35 |
|
#
1dcdd7ef |
| 06-May-2021 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: delete redundant subset=pid check
Two checks in lookup and readdir code should be enough to not have third check in open code.
Can't open what can't be looked up?
Link: https://lkml.kernel.o
proc: delete redundant subset=pid check
Two checks in lookup and readdir code should be enough to not have third check in open code.
Can't open what can't be looked up?
Link: https://lkml.kernel.org/r/YFYYwIBIkytqnkxP@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
d4455fac |
| 06-May-2021 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: mandate ->proc_lseek in "struct proc_ops"
Now that proc_ops are separate from file_operations and other operations it easy to check all instances to have ->proc_lseek hook and remove check in
proc: mandate ->proc_lseek in "struct proc_ops"
Now that proc_ops are separate from file_operations and other operations it easy to check all instances to have ->proc_lseek hook and remove check in main code.
Note: nonseekable_open() files naturally don't require ->proc_lseek.
Garbage collect pde_lseek() function.
[adobriyan@gmail.com: smoke test lseek()] Link: https://lkml.kernel.org/r/YG4OIhChOrVTPgdN@localhost.localdomain
Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10 |
|
#
fe33850f |
| 04-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
proc: wire up generic_file_splice_read for iter ops
Wire up generic_file_splice_read for the iter based proxy ops, so that splice reads from them work.
Signed-off-by: Christoph Hellwig <hch@lst.de>
proc: wire up generic_file_splice_read for iter ops
Wire up generic_file_splice_read for the iter based proxy ops, so that splice reads from them work.
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7 |
|
#
fd5a13f4 |
| 03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: add a read_iter method to proc proc_ops
This will allow proc files to implement iter read semantics.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org
proc: add a read_iter method to proc proc_ops
This will allow proc files to implement iter read semantics.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
906146f4 |
| 03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: cleanup the compat vs no compat file ops
Instead of providing a special no-compat version provide a special compat version for operations with ->compat_ioctl.
Signed-off-by: Christoph Hellwig
proc: cleanup the compat vs no compat file ops
Instead of providing a special no-compat version provide a special compat version for operations with ->compat_ioctl.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
f6ef7b7b |
| 03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: remove a level of indentation in proc_get_inode
Just return early on inode allocation failure.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
Revision tags: v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47 |
|
#
ef1548ad |
| 12-Jun-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use new_inode not new_inode_pseudo
Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use after free when the
proc: Use new_inode not new_inode_pseudo
Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use after free when the watch is removed after the unmount of proc when the watcher exits.
Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") made it easier to unmount proc and allowed syzbot to see the problem, but looking at the code it has been around for a long time.
Looking at the code the fsnotify watch should have been removed by fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode was allocated with new_inode_pseudo instead of new_inode so the inode was not on the sb->s_inodes list. Which prevented fsnotify_unmount_inodes from finding the inode and removing the watch as well as made it so the "VFS: Busy inodes after unmount" warning could not find the inodes to warn about them.
Make all of the inodes in proc visible to generic_shutdown_super, and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo. The only functional difference is that new_inode places the inodes on the sb->s_inodes list.
I wrote a small test program and I can verify that without changes it can trigger this issue, and by replacing new_inode_pseudo with new_inode the issues goes away.
Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread") Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry") Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34 |
|
#
e61bb8b3 |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: use named enums for better readability
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org
proc: use named enums for better readability
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
1c6c4d11 |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: use human-readable values for hidepid
The hidepid parameter values are becoming more and more and it becomes difficult to remember what each new magic number means.
Backward compatibility is
proc: use human-readable values for hidepid
The hidepid parameter values are becoming more and more and it becomes difficult to remember what each new magic number means.
Backward compatibility is preserved since it is possible to specify numerical value for the hidepid parameter. This does not break the fsconfig since it is not possible to specify a numerical value through it. All numeric values are converted to a string. The type FSCONFIG_SET_BINARY cannot be used to indicate a numerical value.
Selftest has been added to verify this behavior.
Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
6814ef2d |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: add option to mount only a pids subset
This allows to hide all files and directories in the procfs that are not related to tasks.
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Revi
proc: add option to mount only a pids subset
This allows to hide all files and directories in the procfs that are not related to tasks.
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
fa10fed3 |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: allow to mount many instances of proc in one pid namespace
This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes, and to allo
proc: allow to mount many instances of proc in one pid namespace
This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes, and to allow that we have to modernize procfs internals.
1) The main aim of this work is to have on embedded systems one supervisor for apps. Right now we have some lightweight sandbox support, however if we create pid namespacess we have to manages all the processes inside too, where our goal is to be able to run a bunch of apps each one inside its own mount namespace without being able to notice each other. We only want to use mount namespaces, and we want procfs to behave more like a real mount point.
2) Linux Security Modules have multiple ptrace paths inside some subsystems, however inside procfs, the implementation does not guarantee that the ptrace() check which triggers the security_ptrace_check() hook will always run. We have the 'hidepid' mount option that can be used to force the ptrace_may_access() check inside has_pid_permissions() to run. The problem is that 'hidepid' is per pid namespace and not attached to the mount point, any remount or modification of 'hidepid' will propagate to all other procfs mounts.
This also does not allow to support Yama LSM easily in desktop and user sessions. Yama ptrace scope which restricts ptrace and some other syscalls to be allowed only on inferiors, can be updated to have a per-task context, where the context will be inherited during fork(), clone() and preserved across execve(). If we support multiple private procfs instances, then we may force the ptrace_may_access() on /proc/<pids>/ to always run inside that new procfs instances. This will allow to specifiy on user sessions if we should populate procfs with pids that the user can ptrace or not.
By using Yama ptrace scope, some restricted users will only be able to see inferiors inside /proc, they won't even be able to see their other processes. Some software like Chromium, Firefox's crash handler, Wine and others are already using Yama to restrict which processes can be ptracable. With this change this will give the possibility to restrict /proc/<pids>/ but more importantly this will give desktop users a generic and usuable way to specifiy which users should see all processes and which users can not.
Side notes: * This covers the lack of seccomp where it is not able to parse arguments, it is easy to install a seccomp filter on direct syscalls that operate on pids, however /proc/<pid>/ is a Linux ABI using filesystem syscalls. With this change LSMs should be able to analyze open/read/write/close...
In the new patch set version I removed the 'newinstance' option as suggested by Eric W. Biederman.
Selftest has been added to verify new behavior.
Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.4.33, v5.4.32, v5.4.31 |
|
#
d919b33d |
| 06-Apr-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: faster open/read/close with "permanent" files
Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"...
Most of fs/proc/inode.
proc: faster open/read/close with "permanent" files
Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"...
Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable in the event of disappearing /proc entries which usually happens if module is getting removed. Files like /proc/cpuinfo which never disappear simply do not need such protection.
Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such "permanent" files.
Enable "permanent" flag for
/proc/cpuinfo /proc/kmsg /proc/modules /proc/slabinfo /proc/stat /proc/sysvipc/* /proc/swaps
More will come once I figure out foolproof way to prevent out module authors from marking their stuff "permanent" for performance reasons when it is not.
This should help with scalability: benchmark is "read /proc/cpuinfo R times by N threads scattered over the system".
N R t, s (before) t, s (after) ----------------------------------------------------- 64 4096 1.582458 1.530502 -3.2% 256 4096 6.371926 6.125168 -3.9% 1024 4096 25.64888 24.47528 -4.6%
Benchmark source:
#include <chrono> #include <iostream> #include <thread> #include <vector>
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h>
const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN); int N; const char *filename; int R;
int xxx = 0;
int glue(int n) { cpu_set_t m; CPU_ZERO(&m); CPU_SET(n, &m); return sched_setaffinity(0, sizeof(cpu_set_t), &m); }
void f(int n) { glue(n % NR_CPUS);
while (*(volatile int *)&xxx == 0) { }
for (int i = 0; i < R; i++) { int fd = open(filename, O_RDONLY); char buf[4096]; ssize_t rv = read(fd, buf, sizeof(buf)); asm volatile ("" :: "g" (rv)); close(fd); } }
int main(int argc, char *argv[]) { if (argc < 4) { std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R "; return 1; }
N = atoi(argv[1]); filename = argv[2]; R = atoi(argv[3]);
for (int i = 0; i < NR_CPUS; i++) { if (glue(i) == 0) break; }
std::vector<std::thread> T; T.reserve(N); for (int i = 0; i < N; i++) { T.emplace_back(f, i); }
auto t0 = std::chrono::system_clock::now(); { *(volatile int *)&xxx = 1; for (auto& t: T) { t.join(); } } auto t1 = std::chrono::system_clock::now(); std::chrono::duration<double> dt = t1 - t0; std::cout << dt.count() << ' ';
return 0; }
P.S.: Explicit randomization marker is added because adding non-function pointer will silently disable structure layout randomization.
[akpm@linux-foundation.org: coding style fixes] Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
904f394e |
| 06-Apr-2020 |
Jules Irenge <jbi.octave@gmail.com> |
fs/proc/inode.c: annotate close_pdeo() for sparse
Fix sparse locking imbalance warning:
warning: context imbalance in close_pdeo() - unexpected unlock
Signed-off-by: Jules Irenge <jbi.octave@gmai
fs/proc/inode.c: annotate close_pdeo() for sparse
Fix sparse locking imbalance warning:
warning: context imbalance in close_pdeo() - unexpected unlock
Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200227201538.GA30462@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22 |
|
#
7bc3e6e5 |
| 19-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use a list of inodes to flush from proc
Rework the flushing of proc to use a list of directory inodes that need to be flushed.
The list is kept on struct pid not on struct task_struct, as the
proc: Use a list of inodes to flush from proc
Rework the flushing of proc to use a list of directory inodes that need to be flushed.
The list is kept on struct pid not on struct task_struct, as there is a fixed connection between proc inodes and pids but at least for the case of de_thread the pid of a task_struct changes.
This removes the dependency on proc_mnt which allows for different mounts of proc having different mount options even in the same pid namespace and this allows for the removal of proc_mnt which will trivially the first mount of proc to honor it's mount options.
This flushing remains an optimization. The functions pid_delete_dentry and pid_revalidate ensure that ordinary dcache management will not attempt to use dentries past the point their respective task has died. When unused the shrinker will eventually be able to remove these dentries.
There is a case in de_thread where proc_flush_pid can be called early for a given pid. Which winds up being safe (if suboptimal) as this is just an optiimization.
Only pid directories are put on the list as the other per pid files are children of those directories and d_invalidate on the directory will get them as well.
So that the pid can be used during flushing it's reference count is taken in release_task and dropped in proc_flush_pid. Further the call of proc_flush_pid is moved after the tasklist_lock is released in release_task so that it is certain that the pid has already been unhashed when flushing it taking place. This removes a small race where a dentry could recreated.
As struct pid is supposed to be small and I need a per pid lock I reuse the only lock that currently exists in struct pid the the wait_pidfd.lock.
The net result is that this adds all of this functionality with just a little extra list management overhead and a single extra pointer in struct pid.
v2: Initialize pid->inodes. I somehow failed to get that initialization into the initial version of the patch. A boot failure was reported by "kernel test robot <lkp@intel.com>", and failure to initialize that pid->inodes matches all of the reported symptoms.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
71448011 |
| 20-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Clear the pieces of proc_inode that proc_evict_inode cares about
This just keeps everything tidier, and allows for using flags like SLAB_TYPESAFE_BY_RCU where slabs are not always cleared befo
proc: Clear the pieces of proc_inode that proc_evict_inode cares about
This just keeps everything tidier, and allows for using flags like SLAB_TYPESAFE_BY_RCU where slabs are not always cleared before reuse. I don't see reuse without reinitializing happening with the proc_inode but I had a false alarm while reworking flushing of proc dentries and indoes when a process dies that caused me to tidy this up.
The code is a little easier to follow and reason about this way so I figured the changes might as well be kept.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
f90f3caf |
| 21-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use d_invalidate in proc_prune_siblings_dcache
The function d_prune_aliases has the problem that it will only prune aliases thare are completely unused. It will not remove aliases for the dca
proc: Use d_invalidate in proc_prune_siblings_dcache
The function d_prune_aliases has the problem that it will only prune aliases thare are completely unused. It will not remove aliases for the dcache or even think of removing mounts from the dcache. For that behavior d_invalidate is needed.
To use d_invalidate replace d_prune_aliases with d_find_alias followed by d_invalidate and dput.
For completeness the directory and the non-directory cases are separated because in theory (although not in currently in practice for proc) directories can only ever have a single dentry while non-directories can have hardlinks and thus multiple dentries. As part of this separation use d_find_any_alias for directories to spare d_find_alias the extra work of doing that.
Plus the differences between d_find_any_alias and d_find_alias makes it clear why the directory and non-directory code and not share code.
To make it clear these routines now invalidate dentries rename proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename proc_sys_prune_dcache proc_sys_invalidate_dcache.
V2: Split the directory and non-directory cases. To make this code robust to future changes in proc.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
080f6276 |
| 21-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: In proc_prune_siblings_dcache cache an aquired super block
Because there are likely to be several sysctls in a row on the same superblock cache the super_block after the count has been raised
proc: In proc_prune_siblings_dcache cache an aquired super block
Because there are likely to be several sysctls in a row on the same superblock cache the super_block after the count has been raised and don't deactivate it until we are processing another super_block.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
26dbc60f |
| 20-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
This prepares the way for allowing the pid part of proc to use this dcache pruning code as well.
Signed-off-by: Eric W. Bieder
proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
This prepares the way for allowing the pid part of proc to use this dcache pruning code as well.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|