History log of /openbmc/linux/fs/proc/proc_sysctl.c (Results 201 – 225 of 362)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 7708bfb1 29-Apr-2008 Pavel Emelyanov <xemul@openvz.org>

sysctl: merge equal proc_sys_read and proc_sys_write

Many (most of) sysctls do not have a per-container sense. E.g.
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget and so on
and

sysctl: merge equal proc_sys_read and proc_sys_write

Many (most of) sysctls do not have a per-container sense. E.g.
kernel.print_fatal_signals, vm.panic_on_oom, net.core.netdev_budget and so on
and so forth. Besides, tuning then from inside a container is not even
secure. On the other hand, hiding them completely from the container's tasks
sometimes causes user-space to stop working.

When developing net sysctl, the common practice was to duplicate a table and
drop the write bits in table->mode, but this approach was not very elegant,
lead to excessive memory consumption and was not suitable in general.

Here's the alternative solution. To facilitate the per-container sysctls
ctl_table_root-s were introduced. Each root contains a list of
ctl_table_header-s that are visible to different namespaces. The idea of this
set is to add the permissions() callback on the ctl_table_root to allow ctl
root limit permissions to the same ctl_table-s.

The main user of this functionality is the net-namespaces code, but later this
will (should) be used by more and more namespaces, containers and control
groups.

Actually, this idea's core is in a single hunk in the third patch. First two
patches are cleanups for sysctl code, while the third one mostly extends the
arguments set of some sysctl functions.

This patch:

These ->read and ->write callbacks act in a very similar way, so merge these
paths to reduce the number of places to patch later and shrink the .text size
(a bit).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.25, v2.6.25-rc9, v2.6.25-rc8, v2.6.25-rc7, v2.6.25-rc6, v2.6.25-rc5, v2.6.25-rc4, v2.6.25-rc3, v2.6.25-rc2
# 4ac91378 14-Feb-2008 Jan Blunck <jblunck@suse.de>

Embed a struct path into struct nameidata instead of nd->{dentry,mnt}

This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for

Embed a struct path into struct nameidata instead of nd->{dentry,mnt}

This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux

with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.25-rc1
# 03a44825 08-Feb-2008 Jan Engelhardt <jengelh@computergmbh.de>

procfs: constify function pointer tables

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acke

procfs: constify function pointer tables

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-By: David Howells <dhowells@redhat.com>
Acked-by: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.24, v2.6.24-rc8, v2.6.24-rc7, v2.6.24-rc6, v2.6.24-rc5, v2.6.24-rc4, v2.6.24-rc3, v2.6.24-rc2
# 2a2da53b 25-Oct-2007 David Howells <dhowells@redhat.com>

Fix pointer mismatches in proc_sysctl.c

Fix pointer mismatches in proc_sysctl.c. The proc_handler() method returns a
size_t through an arg pointer, but is given a pointer to a ssize_t to return
int

Fix pointer mismatches in proc_sysctl.c

Fix pointer mismatches in proc_sysctl.c. The proc_handler() method returns a
size_t through an arg pointer, but is given a pointer to a ssize_t to return
into.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.24-rc1, v2.6.23, v2.6.23-rc9, v2.6.23-rc8, v2.6.23-rc7, v2.6.23-rc6, v2.6.23-rc5, v2.6.23-rc4, v2.6.23-rc3, v2.6.23-rc2, v2.6.23-rc1, v2.6.22, v2.6.22-rc7, v2.6.22-rc6, v2.6.22-rc5, v2.6.22-rc4, v2.6.22-rc3, v2.6.22-rc2, v2.6.22-rc1
# 9d0633cf 08-May-2007 John Johansen <jjohansen@suse.de>

Remove redundant check from proc_sys_setattr()

notify_change() already calls security_inode_setattr() before
calling iop->setattr.

Alan sayeth

This is a behaviour change on all of these and limi

Remove redundant check from proc_sys_setattr()

notify_change() already calls security_inode_setattr() before
calling iop->setattr.

Alan sayeth

This is a behaviour change on all of these and limits some behaviour of
existing established security modules

When inode_change_ok is called it has side effects. This includes
clearing the SGID bit on attribute changes caused by chmod. If you make
this change the results of some rulesets may be different before or after
the change is made.

I'm not saying the change is wrong but it does change behaviour so that
needs looking at closely (ditto all other attribute twiddles)

Signed-off-by: Steve Beattie <sbeattie@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: John Johansen <jjohansen@suse.de>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.21, v2.6.21-rc7, v2.6.21-rc6, v2.6.21-rc5, v2.6.21-rc4, v2.6.21-rc3, v2.6.21-rc2, v2.6.21-rc1
# 86a71dbd 14-Feb-2007 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] sysctl: hide the sysctl proc inodes from selinux

Since the security checks are applied on each read and write of a sysctl file,
just like they are applied when calling sys_sysctl, they are r

[PATCH] sysctl: hide the sysctl proc inodes from selinux

Since the security checks are applied on each read and write of a sysctl file,
just like they are applied when calling sys_sysctl, they are redundant on the
standard VFS constructs. Since it is difficult to compute the security labels
on the standard VFS constructs we just mark the sysctl inodes in proc private
so selinux won't even bother with them.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 77b14db5 14-Feb-2007 Eric W. Biederman <ebiederm@xmission.com>

[PATCH] sysctl: reimplement the sysctl proc support

With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save ab

[PATCH] sysctl: reimplement the sysctl proc support

With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.

The speed feels about the same, even though we can now cache the sysctl
dentries :(

We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.

[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# b7925acd 25-Feb-2021 Josef Bacik <josef@toxicpanda.com>

proc: use kvzalloc for our kernel buffer

[ Upstream commit 4508943794efdd94171549c0bd52810e2f4ad9fe ]

Since

sysctl: pass kernel pointers to ->proc_handler

we hav

proc: use kvzalloc for our kernel buffer

[ Upstream commit 4508943794efdd94171549c0bd52810e2f4ad9fe ]

Since

sysctl: pass kernel pointers to ->proc_handler

we have been pre-allocating a buffer to copy the data from the proc
handlers into, and then copying that to userspace. The problem is this
just blindly kzalloc()'s the buffer size passed in from the read, which in
the case of our 'cat' binary was 64kib. Order-4 allocations are not
awesome, and since we can potentially allocate up to our maximum order, so
use kvzalloc for these buffers.

[willy@infradead.org: changelog tweaks]

Link: https://lkml.kernel.org/r/6345270a2c1160b89dd5e6715461f388176899d1.1612972413.git.josef@toxicpanda.com
Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
CC: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

show more ...


# cb5fe25c 23-Jan-2021 Xiaoming Ni <nixiaoming@huawei.com>

proc_sysctl: fix oops caused by incorrect command parameters

commit 697edcb0e4eadc41645fe88c991fe6a206b1a08d upstream.

The process_sysctl_arg() does not check whether val is empty b

proc_sysctl: fix oops caused by incorrect command parameters

commit 697edcb0e4eadc41645fe88c991fe6a206b1a08d upstream.

The process_sysctl_arg() does not check whether val is empty before
invoking strlen(val). If the command line parameter () is incorrectly
configured and val is empty, oops is triggered.

For example:
"hung_task_panic=1" is incorrectly written as "hung_task_panic", oops is
triggered. The call stack is as follows:
Kernel command line: .... hung_task_panic
......
Call trace:
__pi_strlen+0x10/0x98
parse_args+0x278/0x344
do_sysctl_args+0x8c/0xfc
kernel_init+0x5c/0xf4
ret_from_fork+0x10/0x30

To fix it, check whether "val" is empty when "phram" is a sysctl field.
Error codes are returned in the failure branch, and error logs are
generated by parse_args().

Link: https://lkml.kernel.org/r/20210118133029.28580-1-nixiaoming@huawei.com
Fixes: 3db978d480e2843 ("kernel/sysctl: support setting sysctl parameters from kernel command line")
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: <stable@vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


Revision tags: v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7
# 4bd6a735 03-Sep-2020 Matthew Wilcox (Oracle) <willy@infradead.org>

sysctl: Convert to iter interfaces

Using the read_iter/write_iter interfaces allows for in-kernel users
to set sysctls without using set_fs(). Also, the buffer is a string,
so give

sysctl: Convert to iter interfaces

Using the read_iter/write_iter interfaces allows for in-kernel users
to set sysctls without using set_fs(). Also, the buffer is a string,
so give it the real type of 'char *', not void *.

[AV: Christoph's fixup folded in]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


Revision tags: v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51
# d4d80e69 03-Jul-2020 Matthew Wilcox (Oracle) <willy@infradead.org>

Call sysctl_head_finish on error

This error path returned directly instead of calling sysctl_head_finish().

Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files

Call sysctl_head_finish on error

This error path returned directly instead of calling sysctl_head_finish().

Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


# 1c383726 10-Jun-2020 Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull sysctl fixes from Al Viro:
"Fixups to regressions in sysctl series"

* 'work.sysctl' of

Merge branch 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull sysctl fixes from Al Viro:
"Fixups to regressions in sysctl series"

* 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
sysctl: reject gigantic reads/write to sysctl files
cdrom: fix an incorrect __user annotation on cdrom_sysctl_info
trace: fix an incorrect __user annotation on stack_trace_sysctl
random: fix an incorrect __user annotation on proc_do_entropy
net/sysctl: remove leftover __user annotations on neigh_proc_dointvec*
net/sysctl: use cpumask_parse in flow_limit_cpu_sysctl

show more ...


Revision tags: v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2
# ef9d965b 09-Jun-2020 Christoph Hellwig <hch@lst.de>

sysctl: reject gigantic reads/write to sysctl files

Instead of triggering a WARN_ON deep down in the page allocator just
give up early on allocations that are way larger than the usual s

sysctl: reject gigantic reads/write to sysctl files

Instead of triggering a WARN_ON deep down in the page allocator just
give up early on allocations that are way larger than the usual sysctl
values.

Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


# f117955a 07-Jun-2020 Guilherme G. Piccoli <gpiccoli@canonical.com>

kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases

After a recent change introduced by Vlastimil's series [0], kernel is
able now to handle sysctl parameters

kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases

After a recent change introduced by Vlastimil's series [0], kernel is
able now to handle sysctl parameters on kernel command line; also, the
series introduced a simple infrastructure to convert legacy boot
parameters (that duplicate sysctls) into sysctl aliases.

This patch converts the watchdog parameters softlockup_panic and
{hard,soft}lockup_all_cpu_backtrace to use the new alias infrastructure.
It fixes the documentation too, since the alias only accepts values 0 or
1, not the full range of integers.

We also took the opportunity here to improve the documentation of the
previously converted hung_task_panic (see the patch series [0]) and put
the alias table in alphabetical order.

[0] http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Link: http://lkml.kernel.org/r/20200507214624.21911-1-gpiccoli@canonical.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# b467f3ef 07-Jun-2020 Vlastimil Babka <vbabka@suse.cz>

kernel/hung_task convert hung_task_panic boot parameter to sysctl

We can now handle sysctl parameters on kernel command line and have
infrastructure to convert legacy command line option

kernel/hung_task convert hung_task_panic boot parameter to sysctl

We can now handle sysctl parameters on kernel command line and have
infrastructure to convert legacy command line options that duplicate
sysctl to become a sysctl alias.

This patch converts the hung_task_panic parameter. Note that the sysctl
handler is more strict and allows only 0 and 1, while the legacy
parameter allowed any non-zero value. But there is little reason anyone
would not be using 1.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200427180433.7029-4-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 0a477e1a 07-Jun-2020 Vlastimil Babka <vbabka@suse.cz>

kernel/sysctl: support handling command line aliases

We can now handle sysctl parameters on kernel command line, but
historically some parameters introduced their own command line
eq

kernel/sysctl: support handling command line aliases

We can now handle sysctl parameters on kernel command line, but
historically some parameters introduced their own command line
equivalent, which we don't want to remove for compatibility reasons.

We can, however, convert them to the generic infrastructure with a table
translating the legacy command line parameters to their sysctl names,
and removing the one-off param handlers.

This patch adds the support and makes the first conversion to
demonstrate it, on the (deprecated) numa_zonelist_order parameter.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20200427180433.7029-3-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 3db978d4 07-Jun-2020 Vlastimil Babka <vbabka@suse.cz>

kernel/sysctl: support setting sysctl parameters from kernel command line

Patch series "support setting sysctl parameters from kernel command line", v3.

This series adds support for

kernel/sysctl: support setting sysctl parameters from kernel command line

Patch series "support setting sysctl parameters from kernel command line", v3.

This series adds support for something that seems like many people
always wanted but nobody added it yet, so here's the ability to set
sysctl parameters via kernel command line options in the form of
sysctl.vm.something=1

The important part is Patch 1. The second, not so important part is an
attempt to clean up legacy one-off parameters that do the same thing as
a sysctl. I don't want to remove them completely for compatibility
reasons, but with generic sysctl support the idea is to remove the
one-off param handlers and treat the parameters as aliases for the
sysctl variants.

I have identified several parameters that mention sysctl counterparts in
Documentation/admin-guide/kernel-parameters.txt but there might be more.
The conversion also has varying level of success:

- numa_zonelist_order is converted in Patch 2 together with adding the
necessary infrastructure. It's easy as it doesn't really do anything
but warn on deprecated value these days.

- hung_task_panic is converted in Patch 3, but there's a downside that
now it only accepts 0 and 1, while previously it was any integer
value

- nmi_watchdog maps to two sysctls nmi_watchdog and hardlockup_panic,
so there's no straighforward conversion possible

- traceoff_on_warning is a flag without value and it would be required
to handle that somehow in the conversion infractructure, which seems
pointless for a single flag

This patch (of 5):

A recently proposed patch to add vm_swappiness command line parameter in
addition to existing sysctl [1] made me wonder why we don't have a
general support for passing sysctl parameters via command line.

Googling found only somebody else wondering the same [2], but I haven't
found any prior discussion with reasons why not to do this.

Settings the vm_swappiness issue aside (the underlying issue might be
solved in a different way), quick search of kernel-parameters.txt shows
there are already some that exist as both sysctl and kernel parameter -
hung_task_panic, nmi_watchdog, numa_zonelist_order, traceoff_on_warning.

A general mechanism would remove the need to add more of those one-offs
and might be handy in situations where configuration by e.g.
/etc/sysctl.d/ is impractical.

Hence, this patch adds a new parse_args() pass that looks for parameters
prefixed by 'sysctl.' and tries to interpret them as writes to the
corresponding sys/ files using an temporary in-kernel procfs mount.
This mechanism was suggested by Eric W. Biederman [3], as it handles
all dynamically registered sysctl tables, even though we don't handle
modular sysctls. Errors due to e.g. invalid parameter name or value
are reported in the kernel log.

The processing is hooked right before the init process is loaded, as
some handlers might be more complicated than simple setters and might
need some subsystems to be initialized. At the moment the init process
can be started and eventually execute a process writing to /proc/sys/
then it should be also fine to do that from the kernel.

Sysctls registered later on module load time are not set by this
mechanism - it's expected that in such scenarios, setting sysctl values
from userspace is practical enough.

[1] https://lore.kernel.org/r/BL0PR02MB560167492CA4094C91589930E9FC0@BL0PR02MB5601.namprd02.prod.outlook.com/
[2] https://unix.stackexchange.com/questions/558802/how-to-set-sysctl-using-kernel-command-line-parameter
[3] https://lore.kernel.org/r/87bloj2skm.fsf@x220.int.ebiederm.org/

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Ivan Teterevkov <ivan.teterevkov@nutanix.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Guilherme G . Piccoli" <gpiccoli@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Link: http://lkml.kernel.org/r/20200427180433.7029-1-vbabka@suse.cz
Link: http://lkml.kernel.org/r/20200427180433.7029-2-vbabka@suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36
# 32927393 24-Apr-2020 Christoph Hellwig <hch@lst.de>

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


Revision tags: v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22
# f90f3caf 21-Feb-2020 Eric W. Biederman <ebiederm@xmission.com>

proc: Use d_invalidate in proc_prune_siblings_dcache

The function d_prune_aliases has the problem that it will only prune
aliases thare are completely unused. It will not remove aliases

proc: Use d_invalidate in proc_prune_siblings_dcache

The function d_prune_aliases has the problem that it will only prune
aliases thare are completely unused. It will not remove aliases for
the dcache or even think of removing mounts from the dcache. For that
behavior d_invalidate is needed.

To use d_invalidate replace d_prune_aliases with d_find_alias followed
by d_invalidate and dput.

For completeness the directory and the non-directory cases are
separated because in theory (although not in currently in practice for
proc) directories can only ever have a single dentry while
non-directories can have hardlinks and thus multiple dentries.
As part of this separation use d_find_any_alias for directories
to spare d_find_alias the extra work of doing that.

Plus the differences between d_find_any_alias and d_find_alias makes
it clear why the directory and non-directory code and not share code.

To make it clear these routines now invalidate dentries rename
proc_prune_siblings_dache to proc_invalidate_siblings_dcache, and rename
proc_sys_prune_dcache proc_sys_invalidate_dcache.

V2: Split the directory and non-directory cases. To make this
code robust to future changes in proc.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

show more ...


# 26dbc60f 20-Feb-2020 Eric W. Biederman <ebiederm@xmission.com>

proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache

This prepares the way for allowing the pid part of proc to use this
dcache pruning code as well.

Signed-of

proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache

This prepares the way for allowing the pid part of proc to use this
dcache pruning code as well.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

show more ...


# 0afa5ca8 19-Feb-2020 Eric W. Biederman <ebiederm@xmission.com>

proc: Rename in proc_inode rename sysctl_inodes sibling_inodes

I about to need and use the same functionality for pid based
inodes and there is no point in adding a second field when

proc: Rename in proc_inode rename sysctl_inodes sibling_inodes

I about to need and use the same functionality for pid based
inodes and there is no point in adding a second field when
this field is already here and serving the same purporse.

Just give the field a generic name so it is clear that
it is no longer sysctl specific.

Also for good measure initialize sibling_inodes when
proc_inode is initialized.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

show more ...


Revision tags: v5.4.21, v5.4.20, v5.4.19, v5.4.18
# d56c0d45 03-Feb-2020 Alexey Dobriyan <adobriyan@gmail.com>

proc: decouple proc from VFS with "struct proc_ops"

Currently core /proc code uses "struct file_operations" for custom hooks,
however, VFS doesn't directly call them. Every time VFS exp

proc: decouple proc from VFS with "struct proc_ops"

Currently core /proc code uses "struct file_operations" for custom hooks,
however, VFS doesn't directly call them. Every time VFS expands
file_operations hook set, /proc code bloats for no reason.

Introduce "struct proc_ops" which contains only those hooks which /proc
allows to call into (open, release, read, write, ioctl, mmap, poll). It
doesn't contain module pointer as well.

Save ~184 bytes per usage:

add/remove: 26/26 grow/shrink: 1/4 up/down: 1922/-6674 (-4752)
Function old new delta
sysvipc_proc_ops - 72 +72
...
config_gz_proc_ops - 72 +72
proc_get_inode 289 339 +50
proc_reg_get_unmapped_area 110 107 -3
close_pdeo 227 224 -3
proc_reg_open 289 284 -5
proc_create_data 60 53 -7
rt_cpu_seq_fops 256 - -256
...
default_affinity_proc_fops 256 - -256
Total: Before=5430095, After=5425343, chg -0.09%

Link: http://lkml.kernel.org/r/20191225172228.GA13378@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2
# eec4844f 18-Jul-2019 Matteo Croce <mcroce@redhat.com>

proc/sysctl: add shared variables for range check

In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range. This

proc/sysctl: add shared variables for range check

In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range. This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.

On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.

The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:

$ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
248

Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.

This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:

# scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
Data old new delta
sysctl_vals - 12 +12
__kstrtab_sysctl_vals - 12 +12
max 14 10 -4
int_max 16 - -16
one 68 - -68
zero 128 28 -100
Total: Before=20583249, After=20583085, chg -0.00%

[mcroce@redhat.com: tipc: remove two unused variables]
Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
[akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
[arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
[akpm@linux-foundation.org: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 5ec27ec7 16-Jul-2019 Radoslaw Burny <rburny@google.com>

fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.

Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavi

fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.

Normally, the inode's i_uid/i_gid are translated relative to s_user_ns,
but this is not a correct behavior for proc. Since sysctl permission
check in test_perm is done against GLOBAL_ROOT_[UG]ID, it makes more
sense to use these values in u_[ug]id of proc inodes. In other words:
although uid/gid in the inode is not read during test_perm, the inode
logically belongs to the root of the namespace. I have confirmed this
with Eric Biederman at LPC and in this thread:
https://lore.kernel.org/lkml/87k1kzjdff.fsf@xmission.com

Consequences
============

Since the i_[ug]id values of proc nodes are not used for permissions
checks, this change usually makes no functional difference. However, it
causes an issue in a setup where:

* a namespace container is created without root user in container -
hence the i_[ug]id of proc nodes are set to INVALID_[UG]ID

* container creator tries to configure it by writing /proc/sys files,
e.g. writing /proc/sys/kernel/shmmax to configure shared memory limit

Kernel does not allow to open an inode for writing if its i_[ug]id are
invalid, making it impossible to write shmmax and thus - configure the
container.

Using a container with no root mapping is apparently rare, but we do use
this configuration at Google. Also, we use a generic tool to configure
the container limits, and the inability to write any of them causes a
failure.

History
=======

The invalid uids/gids in inodes first appeared due to 81754357770e (fs:
Update i_[ug]id_(read|write) to translate relative to s_user_ns).
However, AFAIK, this did not immediately cause any issues. The
inability to write to these "invalid" inodes was only caused by a later
commit 0bd23d09b874 (vfs: Don't modify inodes with a uid or gid unknown
to the vfs).

Tested: Used a repro program that creates a user namespace without any
mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
Before the change, it shows the overflow uid, with the change it's 0.
The overflow uid indicates that the uid in the inode is not correct and
thus it is not possible to open the file for writing.

Link: http://lkml.kernel.org/r/20190708115130.250149-1-rburny@google.com
Fixes: 0bd23d09b874 ("vfs: Don't modify inodes with a uid or gid unknown to the vfs")
Signed-off-by: Radoslaw Burny <rburny@google.com>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# ff24e498 02-May-2019 David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Three trivial overlapping conflicts.

Signed-off-by: David S. Miller <davem@davemloft.net>


12345678910>>...15