Revision tags: v2.6.18-rc1 |
|
#
13b41b09 |
| 26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: Use struct pid not struct task_ref
Incrementally update my proc-dont-lock-task_structs-indefinitely patches so that they work with struct pid instead of struct task_ref.
Mostly this i
[PATCH] proc: Use struct pid not struct task_ref
Incrementally update my proc-dont-lock-task_structs-indefinitely patches so that they work with struct pid instead of struct task_ref.
Mostly this is a straight 1-1 substitution.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
#
99f89551 |
| 26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: don't lock task_structs indefinitely
Every inode in /proc holds a reference to a struct task_struct. If a directory or file is opened and remains open after the the task exits this pi
[PATCH] proc: don't lock task_structs indefinitely
Every inode in /proc holds a reference to a struct task_struct. If a directory or file is opened and remains open after the the task exits this pinning continues. With 8K stacks on a 32bit machine the amount pinned per file descriptor is about 10K.
Normally I would figure a reasonable per user process limit is about 100 processes. With 80 processes, with a 1000 file descriptors each I can trigger the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless data.
This patch replaces the struct task_struct pointer with a pointer to a struct task_ref which has a struct task_struct pointer. The so the pinning of dead tasks does not happen.
The code now has to contend with the fact that the task may now exit at any time. Which is a little but not muh more complicated.
With this change it takes about 1000 processes each opening up 1000 file descriptors before I can trigger the OOM killer. Much better.
[mlp@google.com: task_mmu small fixes] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Paul Jackson <pj@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Albert Cahalan <acahalan@gmail.com> Signed-off-by: Prasanna Meda <mlp@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
#
aed7a6c4 |
| 26-Jun-2006 |
Eric W. Biederman <ebiederm@xmission.com> |
[PATCH] proc: Replace proc_inode.type with proc_inode.fd
The sole renaming use of proc_inode.type is to discover the file descriptor number, so just store the file descriptor number and don't wory a
[PATCH] proc: Replace proc_inode.type with proc_inode.fd
The sole renaming use of proc_inode.type is to discover the file descriptor number, so just store the file descriptor number and don't wory about processing this field. This removes any /proc limits on the maximum number of file descriptors, and clears the path to make the hard coded /proc inode numbers go away.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
Revision tags: v2.6.17, v2.6.17-rc6, v2.6.17-rc5, v2.6.17-rc4, v2.6.17-rc3, v2.6.17-rc2, v2.6.17-rc1 |
|
#
fffb60f9 |
| 24-Mar-2006 |
Paul Jackson <pj@sgi.com> |
[PATCH] cpuset memory spread: slab cache format
Rewrap the overly long source code lines resulting from the previous patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch contains onl
[PATCH] cpuset memory spread: slab cache format
Rewrap the overly long source code lines resulting from the previous patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch contains only formatting changes, and no function change.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
#
4b6a9316 |
| 24-Mar-2006 |
Paul Jackson <pj@sgi.com> |
[PATCH] cpuset memory spread: slab cache filesystems
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD memory spreading.
If a slab cache is marked SLAB_MEM_SPREAD, then anyt
[PATCH] cpuset memory spread: slab cache filesystems
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD memory spreading.
If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's in a cpuset with the 'memory_spread_slab' option enabled goes to allocate from such a slab cache, the allocations are spread evenly over all the memory nodes (task->mems_allowed) allowed to that task, instead of favoring allocation on the node local to the current cpu.
The following inode and similar caches are marked SLAB_MEM_SPREAD:
file cache ==== ===== fs/adfs/super.c adfs_inode_cache fs/affs/super.c affs_inode_cache fs/befs/linuxvfs.c befs_inode_cache fs/bfs/inode.c bfs_inode_cache fs/block_dev.c bdev_cache fs/cifs/cifsfs.c cifs_inode_cache fs/coda/inode.c coda_inode_cache fs/dquot.c dquot fs/efs/super.c efs_inode_cache fs/ext2/super.c ext2_inode_cache fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr fs/ext3/super.c ext3_inode_cache fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr fs/fat/cache.c fat_cache fs/fat/inode.c fat_inode_cache fs/freevxfs/vxfs_super.c vxfs_inode fs/hpfs/super.c hpfs_inode_cache fs/isofs/inode.c isofs_inode_cache fs/jffs/inode-v23.c jffs_fm fs/jffs2/super.c jffs2_i fs/jfs/super.c jfs_ip fs/minix/inode.c minix_inode_cache fs/ncpfs/inode.c ncp_inode_cache fs/nfs/direct.c nfs_direct_cache fs/nfs/inode.c nfs_inode_cache fs/ntfs/super.c ntfs_big_inode_cache_name fs/ntfs/super.c ntfs_inode_cache fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache fs/ocfs2/super.c ocfs2_inode_cache fs/proc/inode.c proc_inode_cache fs/qnx4/inode.c qnx4_inode_cache fs/reiserfs/super.c reiser_inode_cache fs/romfs/inode.c romfs_inode_cache fs/smbfs/inode.c smb_inode_cache fs/sysv/inode.c sysv_inode_cache fs/udf/super.c udf_inode_cache fs/ufs/super.c ufs_inode_cache net/socket.c sock_inode_cache net/sunrpc/rpc_pipe.c rpc_inode_cache
The choice of which slab caches to so mark was quite simple. I marked those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache, inode_cache, and buffer_head, which were marked in a previous patch. Even though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same potentially large file system i/o related slab caches as we need for memory spreading.
Given that the rule now becomes "wherever you would have used a SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain. Future file system writers will just copy one of the existing file system slab cache setups and tend to get it right without thinking.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
Revision tags: v2.6.16, v2.6.16-rc6, v2.6.16-rc5, v2.6.16-rc4, v2.6.16-rc3 |
|
#
76b6159b |
| 08-Feb-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[PATCH] fix handling of st_nlink on procfs root
1) it should use nr_processes(), not nr_threads; otherwise we are getting very confused find(1) and friends, among other things. 2) better do that at
[PATCH] fix handling of st_nlink on procfs root
1) it should use nr_processes(), not nr_threads; otherwise we are getting very confused find(1) and friends, among other things. 2) better do that at stat() time than at every damn lookup in procfs root.
Patch had been sitting in FC4 kernels for many months now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
Revision tags: v2.6.16-rc2, v2.6.16-rc1 |
|
#
fee781e6 |
| 08-Jan-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] fs/proc/: function prototypes belong in header files
Function prototypes belong into header files.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> S
[PATCH] fs/proc/: function prototypes belong in header files
Function prototypes belong into header files.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
Revision tags: v2.6.15, v2.6.15-rc7, v2.6.15-rc6, v2.6.15-rc5, v2.6.15-rc4, v2.6.15-rc3, v2.6.15-rc2, v2.6.15-rc1 |
|
#
e9543659 |
| 30-Oct-2005 |
Kirill Korotaev <dev@sw.ru> |
[PATCH] proc: fix of error path in proc_get_inode()
This patch fixes incorrect error path in proc_get_inode(), when module can't be get due to being unloaded. When try_module_get() fails, this func
[PATCH] proc: fix of error path in proc_get_inode()
This patch fixes incorrect error path in proc_get_inode(), when module can't be get due to being unloaded. When try_module_get() fails, this function puts de(!) and still returns inode with non-getted de.
There are still unresolved known bugs in proc yet to be fixed: - proc_dir_entry tree is managed without any serialization - create_proc_entry() doesn't setup de->owner anyhow, so setting it later manually is inatomic. - looks like almost all modules do not care whether it's de->owner is set...
Signed-Off-By: Denis Lunev <den@sw.ru> Signed-Off-By: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
Revision tags: v2.6.14, v2.6.14-rc5, v2.6.14-rc4, v2.6.14-rc3, v2.6.14-rc2, v2.6.14-rc1 |
|
#
fef26658 |
| 09-Sep-2005 |
Mark Fasheh <mark.fasheh@oracle.com> |
[PATCH] update filesystems for new delete_inode behavior
Update the file systems in fs/ implementing a delete_inode() callback to call truncate_inode_pages(). One implementation note: In developing
[PATCH] update filesystems for new delete_inode behavior
Update the file systems in fs/ implementing a delete_inode() callback to call truncate_inode_pages(). One implementation note: In developing this patch I put the calls to truncate_inode_pages() at the very top of those filesystems delete_inode() callbacks in order to retain the previous behavior. I'm guessing that some of those could probably be optimized.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
show more ...
|
Revision tags: v2.6.13, v2.6.13-rc7, v2.6.13-rc6, v2.6.13-rc5, v2.6.13-rc4, v2.6.13-rc3, v2.6.13-rc2, v2.6.13-rc1, v2.6.12, v2.6.12-rc6, v2.6.12-rc5, v2.6.12-rc4, v2.6.12-rc3, v2.6.12-rc2 |
|
#
1da177e4 |
| 16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in
Linux-2.6.12-rc2
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it.
Let it rip!
show more ...
|
Revision tags: v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10 |
|
#
fe33850f |
| 04-Nov-2020 |
Christoph Hellwig <hch@lst.de> |
proc: wire up generic_file_splice_read for iter ops Wire up generic_file_splice_read for the iter based proxy ops, so that splice reads from them work. Signed-off-by: Christoph
proc: wire up generic_file_splice_read for iter ops Wire up generic_file_splice_read for the iter based proxy ops, so that splice reads from them work. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7 |
|
#
fd5a13f4 |
| 03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: add a read_iter method to proc proc_ops This will allow proc files to implement iter read semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <
proc: add a read_iter method to proc proc_ops This will allow proc files to implement iter read semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
906146f4 |
| 03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: cleanup the compat vs no compat file ops Instead of providing a special no-compat version provide a special compat version for operations with ->compat_ioctl. Signed-off-b
proc: cleanup the compat vs no compat file ops Instead of providing a special no-compat version provide a special compat version for operations with ->compat_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
f6ef7b7b |
| 03-Sep-2020 |
Christoph Hellwig <hch@lst.de> |
proc: remove a level of indentation in proc_get_inode Just return early on inode allocation failure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@z
proc: remove a level of indentation in proc_get_inode Just return early on inode allocation failure. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
Revision tags: v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47 |
|
#
ef1548ad |
| 12-Jun-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use new_inode not new_inode_pseudo Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use afte
proc: Use new_inode not new_inode_pseudo Recently syzbot reported that unmounting proc when there is an ongoing inotify watch on the root directory of proc could result in a use after free when the watch is removed after the unmount of proc when the watcher exits. Commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of proc") made it easier to unmount proc and allowed syzbot to see the problem, but looking at the code it has been around for a long time. Looking at the code the fsnotify watch should have been removed by fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode was allocated with new_inode_pseudo instead of new_inode so the inode was not on the sb->s_inodes list. Which prevented fsnotify_unmount_inodes from finding the inode and removing the watch as well as made it so the "VFS: Busy inodes after unmount" warning could not find the inodes to warn about them. Make all of the inodes in proc visible to generic_shutdown_super, and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo. The only functional difference is that new_inode places the inodes on the sb->s_inodes list. I wrote a small test program and I can verify that without changes it can trigger this issue, and by replacing new_inode_pseudo with new_inode the issues goes away. Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com Fixes: 0097875bd415 ("proc: Implement /proc/thread-self to point at the directory of the current thread") Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry") Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34 |
|
#
e61bb8b3 |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: use named enums for better readability Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keesc
proc: use named enums for better readability Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
1c6c4d11 |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: use human-readable values for hidepid The hidepid parameter values are becoming more and more and it becomes difficult to remember what each new magic number means. Backwa
proc: use human-readable values for hidepid The hidepid parameter values are becoming more and more and it becomes difficult to remember what each new magic number means. Backward compatibility is preserved since it is possible to specify numerical value for the hidepid parameter. This does not break the fsconfig since it is not possible to specify a numerical value through it. All numeric values are converted to a string. The type FSCONFIG_SET_BINARY cannot be used to indicate a numerical value. Selftest has been added to verify this behavior. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
6814ef2d |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: add option to mount only a pids subset This allows to hide all files and directories in the procfs that are not related to tasks. Signed-off-by: Alexey Gladkov <gladkov.al
proc: add option to mount only a pids subset This allows to hide all files and directories in the procfs that are not related to tasks. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
fa10fed3 |
| 19-Apr-2020 |
Alexey Gladkov <gladkov.alexey@gmail.com> |
proc: allow to mount many instances of proc in one pid namespace This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes,
proc: allow to mount many instances of proc in one pid namespace This patch allows to have multiple procfs instances inside the same pid namespace. The aim here is lightweight sandboxes, and to allow that we have to modernize procfs internals. 1) The main aim of this work is to have on embedded systems one supervisor for apps. Right now we have some lightweight sandbox support, however if we create pid namespacess we have to manages all the processes inside too, where our goal is to be able to run a bunch of apps each one inside its own mount namespace without being able to notice each other. We only want to use mount namespaces, and we want procfs to behave more like a real mount point. 2) Linux Security Modules have multiple ptrace paths inside some subsystems, however inside procfs, the implementation does not guarantee that the ptrace() check which triggers the security_ptrace_check() hook will always run. We have the 'hidepid' mount option that can be used to force the ptrace_may_access() check inside has_pid_permissions() to run. The problem is that 'hidepid' is per pid namespace and not attached to the mount point, any remount or modification of 'hidepid' will propagate to all other procfs mounts. This also does not allow to support Yama LSM easily in desktop and user sessions. Yama ptrace scope which restricts ptrace and some other syscalls to be allowed only on inferiors, can be updated to have a per-task context, where the context will be inherited during fork(), clone() and preserved across execve(). If we support multiple private procfs instances, then we may force the ptrace_may_access() on /proc/<pids>/ to always run inside that new procfs instances. This will allow to specifiy on user sessions if we should populate procfs with pids that the user can ptrace or not. By using Yama ptrace scope, some restricted users will only be able to see inferiors inside /proc, they won't even be able to see their other processes. Some software like Chromium, Firefox's crash handler, Wine and others are already using Yama to restrict which processes can be ptracable. With this change this will give the possibility to restrict /proc/<pids>/ but more importantly this will give desktop users a generic and usuable way to specifiy which users should see all processes and which users can not. Side notes: * This covers the lack of seccomp where it is not able to parse arguments, it is easy to install a seccomp filter on direct syscalls that operate on pids, however /proc/<pid>/ is a Linux ABI using filesystem syscalls. With this change LSMs should be able to analyze open/read/write/close... In the new patch set version I removed the 'newinstance' option as suggested by Eric W. Biederman. Selftest has been added to verify new behavior. Signed-off-by: Alexey Gladkov <gladkov.alexey@gmail.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
Revision tags: v5.4.33, v5.4.32, v5.4.31 |
|
#
d919b33d |
| 06-Apr-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: faster open/read/close with "permanent" files Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"... Mo
proc: faster open/read/close with "permanent" files Now that "struct proc_ops" exist we can start putting there stuff which could not fly with VFS "struct file_operations"... Most of fs/proc/inode.c file is dedicated to make open/read/.../close reliable in the event of disappearing /proc entries which usually happens if module is getting removed. Files like /proc/cpuinfo which never disappear simply do not need such protection. Save 2 atomic ops, 1 allocation, 1 free per open/read/close sequence for such "permanent" files. Enable "permanent" flag for /proc/cpuinfo /proc/kmsg /proc/modules /proc/slabinfo /proc/stat /proc/sysvipc/* /proc/swaps More will come once I figure out foolproof way to prevent out module authors from marking their stuff "permanent" for performance reasons when it is not. This should help with scalability: benchmark is "read /proc/cpuinfo R times by N threads scattered over the system". N R t, s (before) t, s (after) ----------------------------------------------------- 64 4096 1.582458 1.530502 -3.2% 256 4096 6.371926 6.125168 -3.9% 1024 4096 25.64888 24.47528 -4.6% Benchmark source: #include <chrono> #include <iostream> #include <thread> #include <vector> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> const int NR_CPUS = sysconf(_SC_NPROCESSORS_ONLN); int N; const char *filename; int R; int xxx = 0; int glue(int n) { cpu_set_t m; CPU_ZERO(&m); CPU_SET(n, &m); return sched_setaffinity(0, sizeof(cpu_set_t), &m); } void f(int n) { glue(n % NR_CPUS); while (*(volatile int *)&xxx == 0) { } for (int i = 0; i < R; i++) { int fd = open(filename, O_RDONLY); char buf[4096]; ssize_t rv = read(fd, buf, sizeof(buf)); asm volatile ("" :: "g" (rv)); close(fd); } } int main(int argc, char *argv[]) { if (argc < 4) { std::cerr << "usage: " << argv[0] << ' ' << "N /proc/filename R "; return 1; } N = atoi(argv[1]); filename = argv[2]; R = atoi(argv[3]); for (int i = 0; i < NR_CPUS; i++) { if (glue(i) == 0) break; } std::vector<std::thread> T; T.reserve(N); for (int i = 0; i < N; i++) { T.emplace_back(f, i); } auto t0 = std::chrono::system_clock::now(); { *(volatile int *)&xxx = 1; for (auto& t: T) { t.join(); } } auto t1 = std::chrono::system_clock::now(); std::chrono::duration<double> dt = t1 - t0; std::cout << dt.count() << ' '; return 0; } P.S.: Explicit randomization marker is added because adding non-function pointer will silently disable structure layout randomization. [akpm@linux-foundation.org: coding style fixes] Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Link: http://lkml.kernel.org/r/20200222201539.GA22576@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
904f394e |
| 06-Apr-2020 |
Jules Irenge <jbi.octave@gmail.com> |
fs/proc/inode.c: annotate close_pdeo() for sparse Fix sparse locking imbalance warning: warning: context imbalance in close_pdeo() - unexpected unlock Signed-off-by
fs/proc/inode.c: annotate close_pdeo() for sparse Fix sparse locking imbalance warning: warning: context imbalance in close_pdeo() - unexpected unlock Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20200227201538.GA30462@avx2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22 |
|
#
7bc3e6e5 |
| 19-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use a list of inodes to flush from proc Rework the flushing of proc to use a list of directory inodes that need to be flushed. The list is kept on struct pid not on struct
proc: Use a list of inodes to flush from proc Rework the flushing of proc to use a list of directory inodes that need to be flushed. The list is kept on struct pid not on struct task_struct, as there is a fixed connection between proc inodes and pids but at least for the case of de_thread the pid of a task_struct changes. This removes the dependency on proc_mnt which allows for different mounts of proc having different mount options even in the same pid namespace and this allows for the removal of proc_mnt which will trivially the first mount of proc to honor it's mount options. This flushing remains an optimization. The functions pid_delete_dentry and pid_revalidate ensure that ordinary dcache management will not attempt to use dentries past the point their respective task has died. When unused the shrinker will eventually be able to remove these dentries. There is a case in de_thread where proc_flush_pid can be called early for a given pid. Which winds up being safe (if suboptimal) as this is just an optiimization. Only pid directories are put on the list as the other per pid files are children of those directories and d_invalidate on the directory will get them as well. So that the pid can be used during flushing it's reference count is taken in release_task and dropped in proc_flush_pid. Further the call of proc_flush_pid is moved after the tasklist_lock is released in release_task so that it is certain that the pid has already been unhashed when flushing it taking place. This removes a small race where a dentry could recreated. As struct pid is supposed to be small and I need a per pid lock I reuse the only lock that currently exists in struct pid the the wait_pidfd.lock. The net result is that this adds all of this functionality with just a little extra list management overhead and a single extra pointer in struct pid. v2: Initialize pid->inodes. I somehow failed to get that initialization into the initial version of the patch. A boot failure was reported by "kernel test robot <lkp@intel.com>", and failure to initialize that pid->inodes matches all of the reported symptoms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
show more ...
|
#
71448011 |
| 20-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Clear the pieces of proc_inode that proc_evict_inode cares about This just keeps everything tidier, and allows for using flags like SLAB_TYPESAFE_BY_RCU where slabs are not always
proc: Clear the pieces of proc_inode that proc_evict_inode cares about This just keeps everything tidier, and allows for using flags like SLAB_TYPESAFE_BY_RCU where slabs are not always cleared before reuse. I don't see reuse without reinitializing happening with the proc_inode but I had a false alarm while reworking flushing of proc dentries and indoes when a process dies that caused me to tidy this up. The code is a little easier to follow and reason about this way so I figured the changes might as well be kept. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
f90f3caf |
| 21-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: Use d_invalidate in proc_prune_siblings_dcache The function d_prune_aliases has the problem that it will only prune aliases thare are completely unused. It will not remove aliases
proc: Use d_invalidate in proc_prune_siblings_dcache The function d_prune_aliases has the problem that it will only prune aliases thare are completely unused. It will not remove aliases for the dcache or even think of removing mounts from the dcache. For that behavior d_invalidate is needed. To use d_invalidate replace d_prune_aliases with d_find_alias followed by d_invalidate and dput. For completeness the directory and the non-directory cases are separated because in theory (although not in currently in practice for proc) directories can only ever have a single dentry while non-directories can have hardlinks and thus multiple dentries. As part of this separation use d_find_any_alias for directories to spare d_find_alias the extra work of doing that. Plus the differences between d_find_any_alias and d_find_alias makes it clear why the directory and non-directory code and not share code. To make it clear these routines now invalidate dentries rename proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename proc_sys_prune_dcache proc_sys_invalidate_dcache. V2: Split the directory and non-directory cases. To make this code robust to future changes in proc. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
080f6276 |
| 21-Feb-2020 |
Eric W. Biederman <ebiederm@xmission.com> |
proc: In proc_prune_siblings_dcache cache an aquired super block Because there are likely to be several sysctls in a row on the same superblock cache the super_block after the count has
proc: In proc_prune_siblings_dcache cache an aquired super block Because there are likely to be several sysctls in a row on the same superblock cache the super_block after the count has been raised and don't deactivate it until we are processing another super_block. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|