Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3 |
|
#
33195560 |
| 08-Sep-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86/shstk: Handle vfork clone failure correctly
Shadow stacks are allocated automatically and freed on exit, depending on the clone flags. The two cases where new shadow stacks are not allocated are
x86/shstk: Handle vfork clone failure correctly
Shadow stacks are allocated automatically and freed on exit, depending on the clone flags. The two cases where new shadow stacks are not allocated are !CLONE_VM (fork()) and CLONE_VFORK (vfork()). For !CLONE_VM, although a new stack is not allocated, it can be freed normally because it will happen in the child's copy of the VM.
However, for CLONE_VFORK the parent and the child are actually using the same shadow stack. So the kernel doesn't need to allocate *or* free a shadow stack for a CLONE_VFORK child. CLONE_VFORK children already need special tracking to avoid returning to userspace until the child exits or execs. Shadow stack uses this same tracking to avoid freeing CLONE_VFORK shadow stacks.
However, the tracking is not setup until the clone has succeeded (internally). Which means, if a CLONE_VFORK fails, the existing logic will not know it is a CLONE_VFORK and proceed to unmap the parents shadow stack. This error handling cleanup logic runs via exit_thread() in the bad_fork_cleanup_thread label in copy_process(). The issue was seen in the glibc test "posix/tst-spawn3-pidfd" while running with shadow stack using currently out-of-tree glibc patches.
Fix it by not unmapping the vfork shadow stack in the error case as well. Since clone is implemented in core code, it is not ideal to pass the clone flags along the error path in order to have shadow stack code have symmetric logic in the freeing half of the thread shadow stack handling.
Instead use the existing state for thread shadow stacks to track whether the thread is managing its own shadow stack. For CLONE_VFORK, simply set shstk->base and shstk->size to 0, and have it mean the thread is not managing a shadow stack and so should skip cleanup work. Implement this by breaking up the CLONE_VFORK and !CLONE_VM cases in shstk_alloc_thread_stack() to separate conditionals since, the logic is now different between them. In the case of CLONE_VFORK && !CLONE_VM, the existing behavior is to not clean up the shadow stack in the child (which should go away quickly with either be exit or exec), so maintain that behavior by handling the CLONE_VFORK case first in the allocation path.
This new logioc cleanly handles the case of normal, successful CLONE_VFORK's skipping cleaning up their shadow stack's on exit as well. So remove the existing, vfork shadow stack freeing logic. This is in deactivate_mm() where vfork_done is used to tell if it is a vfork child that can skip cleaning up the thread shadow stack.
Fixes: b2926a36b97a ("x86/shstk: Handle thread shadow stack") Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: H.J. Lu <hjl.tools@gmail.com> Link: https://lore.kernel.org/all/20230908203655.543765-2-rick.p.edgecombe%40intel.com
show more ...
|
Revision tags: v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34 |
|
#
b2926a36 |
| 12-Jun-2023 |
Rick Edgecombe <rick.p.edgecombe@intel.com> |
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflic
x86/shstk: Handle thread shadow stack
When a process is duplicated, but the child shares the address space with the parent, there is potential for the threads sharing a single stack to cause conflicts for each other. In the normal non-CET case this is handled in two ways.
With regular CLONE_VM a new stack is provided by userspace such that the parent and child have different stacks.
For vfork, the parent is suspended until the child exits. So as long as the child doesn't return from the vfork()/CLONE_VFORK calling function and sticks to a limited set of operations, the parent and child can share the same stack.
For shadow stack, these scenarios present similar sharing problems. For the CLONE_VM case, the child and the parent must have separate shadow stacks. Instead of changing clone to take a shadow stack, have the kernel just allocate one and switch to it.
Use stack_size passed from clone3() syscall for thread shadow stack size. A compat-mode thread shadow stack size is further reduced to 1/4. This allows more threads to run in a 32-bit address space. The clone() does not pass stack_size, which was added to clone3(). In that case, use RLIMIT_STACK size and cap to 4 GB.
For shadow stack enabled vfork(), the parent and child can share the same shadow stack, like they can share a normal stack. Since the parent is suspended until the child terminates, the child will not interfere with the parent while executing as long as it doesn't return from the vfork() and overwrite up the shadow stack. The child can safely overwrite down the shadow stack, as the parent can just overwrite this later. So CET does not add any additional limitations for vfork().
Free the shadow stack on thread exit by doing it in mm_release(). Skip this when exiting a vfork() child since the stack is shared in the parent.
During this operation, the shadow stack pointer of the new thread needs to be updated to point to the newly allocated shadow stack. Since the ability to do this is confined to the FPU subsystem, change fpu_clone() to take the new shadow stack pointer, and update it internally inside the FPU subsystem. This part was suggested by Thomas Gleixner.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-30-rick.p.edgecombe%40intel.com
show more ...
|
Revision tags: v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19 |
|
#
23e5d9ec |
| 12-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect canonical addresses. An attempt to pass down tagged pointer will lead to ad
x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
IOMMU and SVA-capable devices know nothing about LAM and only expect canonical addresses. An attempt to pass down tagged pointer will lead to address translation failure.
By default do not allow to enable both LAM and use SVA in the same process.
The new ARCH_FORCE_TAGGED_SVA arch_prctl() overrides the limitation. By using the arch_prctl() userspace takes responsibility to never pass tagged address to the device.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Ashok Raj <ashok.raj@intel.com> Reviewed-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230312112612.31869-12-kirill.shutemov%40linux.intel.com
show more ...
|
#
f7d30434 |
| 12-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm: Expose untagging mask in /proc/$PID/status
Add a line in /proc/$PID/status to report untag_mask. It can be used to find out LAM status of the process from the outside. It is useful for debuggers
mm: Expose untagging mask in /proc/$PID/status
Add a line in /proc/$PID/status to report untag_mask. It can be used to find out LAM status of the process from the outside. It is useful for debuggers.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-10-kirill.shutemov%40linux.intel.com
show more ...
|
#
74c228d2 |
| 12-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/uaccess: Provide untagged_addr() and remove tags before address check
untagged_addr() is a helper used by the core-mm to strip tag bits and get the address to the canonical shape based on rules
x86/uaccess: Provide untagged_addr() and remove tags before address check
untagged_addr() is a helper used by the core-mm to strip tag bits and get the address to the canonical shape based on rules of the current thread. It only handles userspace addresses.
The untagging mask is stored in per-CPU variable and set on context switching to the task.
The tags must not be included into check whether it's okay to access the userspace address. Strip tags in access_ok().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-7-kirill.shutemov%40linux.intel.com
show more ...
|
#
82721d8b |
| 12-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/mm: Handle LAM on context switch
Linear Address Masking mode for userspace pointers encoded in CR3 bits. The mode is selected per-process and stored in mm_context_t.
switch_mm_irqs_off() now re
x86/mm: Handle LAM on context switch
Linear Address Masking mode for userspace pointers encoded in CR3 bits. The mode is selected per-process and stored in mm_context_t.
switch_mm_irqs_off() now respects selected LAM mode and constructs CR3 accordingly.
The active LAM mode gets recorded in the tlb_state.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-5-kirill.shutemov%40linux.intel.com
show more ...
|
#
5ef495e5 |
| 12-Mar-2023 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86: Allow atomic MM_CONTEXT flags setting
So far there's no need in atomic setting of MM context flags in mm_context_t::flags. The flags set early in exec and never change after that.
LAM enabling
x86: Allow atomic MM_CONTEXT flags setting
So far there's no need in atomic setting of MM context flags in mm_context_t::flags. The flags set early in exec and never change after that.
LAM enabling requires atomic flag setting. The upcoming flag MM_CONTEXT_FORCE_TAGGED_SVA can be set much later in the process lifetime where multiple threads exist.
Convert the field to unsigned long and do MM_CONTEXT_* accesses with __set_bit() and test_bit().
No functional changes.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/all/20230312112612.31869-3-kirill.shutemov%40linux.intel.com
show more ...
|
Revision tags: v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11 |
|
#
c9ae1b10 |
| 07-Feb-2023 |
Juergen Gross <jgross@suse.com> |
x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
The two paravirt callbacks .mmu.activate_mm() and .mmu.dup_mmap() are sharing the same implementations in all cases: for Xen PV guests they
x86/paravirt: Merge activate_mm() and dup_mmap() callbacks
The two paravirt callbacks .mmu.activate_mm() and .mmu.dup_mmap() are sharing the same implementations in all cases: for Xen PV guests they are pinning the PGD of the new mm_struct, and for all other cases they are a NOP.
In the end, both callbacks are meant to register an address space with the underlying hypervisor, so there needs to be only a single callback for that purpose.
So merge them to a common callback .mmu.enter_mmap() (in contrast to the corresponding already existing .mmu.exit_mmap()).
As the first parameter of the old callbacks isn't used, drop it from the replacement one.
[ bp: Remove last occurrence of paravirt_activate_mm() in asm/mmu_context.h ]
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Link: https://lore.kernel.org/r/20230207075902.7539-1-jgross@suse.com
show more ...
|
Revision tags: v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19 |
|
#
ae53fa18 |
| 12-Jan-2023 |
H. Peter Anvin (Intel) <hpa@zytor.com> |
x86/gsseg: Move load_gs_index() to its own new header file
GS is a special segment on x86_64, move load_gs_index() to its own new header file to simplify header inclusion.
No change in functionalit
x86/gsseg: Move load_gs_index() to its own new header file
GS is a special segment on x86_64, move load_gs_index() to its own new header file to simplify header inclusion.
No change in functionality.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230112072032.35626-5-xin3.li@intel.com
show more ...
|
Revision tags: v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32 |
|
#
3a24a608 |
| 25-Mar-2022 |
Brian Gerst <brgerst@gmail.com> |
x86/32: Remove lazy GS macros
GS is always a user segment now.
Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutron
x86/32: Remove lazy GS macros
GS is always a user segment now.
Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220325153953.162643-4-brgerst@gmail.com
show more ...
|
Revision tags: v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10, v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62 |
|
#
586c4f24 |
| 01-Sep-2020 |
Nicholas Piggin <npiggin@gmail.com> |
x86: use asm-generic/mmu_context.h for no-op implementations
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linu
x86: use asm-generic/mmu_context.h for no-op implementations
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
show more ...
|
#
ff170cd0 |
| 03-Oct-2020 |
Gabriel Krisman Bertazi <krisman@collabora.com> |
x86/mm: Convert mmu context ia32_compat into a proper flags field
The ia32_compat attribute is a weird thing. It mirrors TIF_IA32 and TIF_X32 and is used only in two very unrelated places: (1) to d
x86/mm: Convert mmu context ia32_compat into a proper flags field
The ia32_compat attribute is a weird thing. It mirrors TIF_IA32 and TIF_X32 and is used only in two very unrelated places: (1) to decide if the vsyscall page is accessible (2) for uprobes to find whether the patched instruction is 32 or 64 bit.
In preparation to remove the TIF flags, a new mechanism is required for ia32_compat, but given its odd semantics, adding a real flags field which configures these specific behaviours is the best option.
So, set_personality_x64() can ask for the vsyscall page, which is not available in x32/ia32 and set_personality_ia32() can configure the uprobe code as needed.
uprobe cannot rely on other methods like user_64bit_mode() to decide how to patch, so it needs some specific flag like this.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski<luto@kernel.org> Link: https://lore.kernel.org/r/20201004032536.1229030-10-krisman@collabora.com
show more ...
|
Revision tags: v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57 |
|
#
ca15ca40 |
| 07-Aug-2020 |
Mike Rapoport <rppt@linux.ibm.com> |
mm: remove unneeded includes of <asm/pgalloc.h>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"
Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermedi
mm: remove unneeded includes of <asm/pgalloc.h>
Patch series "mm: cleanup usage of <asm/pgalloc.h>"
Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate.
In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h>
In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/.
This patch (of 8):
In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header.
As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file.
The process was somewhat automated using
sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h'))
where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h.
[rppt@linux.ibm.com: fix powerpc warning]
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Joerg Roedel <joro@8bytes.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35 |
|
#
9020d395 |
| 21-Apr-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/alternatives: Move temporary_mm helpers into C
The only user of these inlines is the text poke code and this must not be exposed to the world.
No functional change.
Signed-off-by: Thomas Gleix
x86/alternatives: Move temporary_mm helpers into C
The only user of these inlines is the text poke code and this must not be exposed to the world.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.139069561@linutronix.de
show more ...
|
#
cb2a0235 |
| 21-Apr-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/cr4: Sanitize CR4.PCE update
load_mm_cr4_irqsoff() is really a strange name for a function which has only one purpose: Update the CR4.PCE bit depending on the perf state.
Rename it to update_cr
x86/cr4: Sanitize CR4.PCE update
load_mm_cr4_irqsoff() is really a strange name for a function which has only one purpose: Update the CR4.PCE bit depending on the perf state.
Rename it to update_cr4_pce_mm(), move it into the tlb code and provide a function which can be invoked by the perf smp function calls.
Another step to remove exposure of cpu_tlbstate.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200421092559.049499158@linutronix.de
show more ...
|
#
8c5cc19e |
| 21-Apr-2020 |
Thomas Gleixner <tglx@linutronix.de> |
x86/tlb: Uninline __get_current_cr3_fast()
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed b
x86/tlb: Uninline __get_current_cr3_fast()
cpu_tlbstate is exported because various TLB-related functions need access to it, but cpu_tlbstate is sensitive information which should only be accessed by well-contained kernel functions and not be directly exposed to modules.
In preparation for unexporting cpu_tlbstate move __get_current_cr3_fast() into the x86 TLB management code.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200421092558.848064318@linutronix.de
show more ...
|
Revision tags: v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30 |
|
#
7969f226 |
| 01-Apr-2020 |
Anshuman Khandual <anshuman.khandual@arm.com> |
mm/vma: make vma_is_foreign() available for general use
Idea of a foreign VMA with respect to the present context is very generic. But currently there are two identical definitions for this in power
mm/vma: make vma_is_foreign() available for general use
Idea of a foreign VMA with respect to the present context is very generic. But currently there are two identical definitions for this in powerpc and x86 platforms. Lets consolidate those redundant definitions while making vma_is_foreign() available for general use later. This should not cause any functional change.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Link: http://lkml.kernel.org/r/1582782965-3274-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15 |
|
#
45fc24e8 |
| 23-Jan-2020 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/mpx: remove MPX from arch/x86
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc).
This removes a
x86/mpx: remove MPX from arch/x86
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc).
This removes all the remaining (dead at this point) MPX handling code remaining in the tree. The only remaining code is the XSAVE support for MPX state which is currently needd for KVM to handle VMs which might use MPX.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
show more ...
|
#
42222eae |
| 23-Jan-2020 |
Dave Hansen <dave.hansen@linux.intel.com> |
mm: remove arch_bprm_mm_init() hook
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc).
arch_bprm_mm
mm: remove arch_bprm_mm_init() hook
From: Dave Hansen <dave.hansen@linux.intel.com>
MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc).
arch_bprm_mm_init() is used at execve() time. The only non-stub implementation is on x86 for MPX. Remove the hook entirely from all architectures and generic code.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
show more ...
|
Revision tags: v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14 |
|
#
186525bd |
| 29-Nov-2019 |
Ingo Molnar <mingo@kernel.org> |
mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions
- Untangle the somewhat incestous way of how VMALLOC_START is used all across the kernel, but is, on x86,
mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions
- Untangle the somewhat incestous way of how VMALLOC_START is used all across the kernel, but is, on x86, defined deep inside one of the lowest level page table headers. It doesn't help that vmalloc.h only includes a single asm header:
#include <asm/page.h> /* pgprot_t */
So there was no existing cross-arch way to decouple address layout definitions from page.h details. I used this:
#ifndef VMALLOC_START # include <asm/vmalloc.h> #endif
This way every architecture that wants to simplify page.h can do so.
- Also on x86 we had a couple of LDT related inline functions that used the late-stage address space layout positions - but these could be uninlined without real trouble - the end result is cleaner this way as well.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
405b4537 |
| 24-Nov-2019 |
Anthony Steinhauser <asteinhauser@google.com> |
perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0
When you successfully write 0 to /sys/devices/cpu/rdpmc, the RDPMC instruction should be disabled unconditionally and i
perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0
When you successfully write 0 to /sys/devices/cpu/rdpmc, the RDPMC instruction should be disabled unconditionally and immediately (after you close the SYSFS file) by the documentation.
Instead, in the current implementation the PMU must be reloaded which happens only eventually some time in the future. Only after that the RDPMC instruction becomes disabled (on ring 3) on the respective core.
This change makes the treatment of the 0 value as blocking and as unconditional as the current treatment of the 2 value, only the CR4.PCE bit is naturally set to false instead of true.
Signed-off-by: Anthony Steinhauser <asteinhauser@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: https://lkml.kernel.org/r/20191125054838.137615-1-asteinhauser@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
Revision tags: v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10, v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12 |
|
#
21e450d2 |
| 18-Jun-2019 |
Jan Kiszka <jan.kiszka@siemens.com> |
x86/mm: Avoid redundant interrupt disable in load_mm_cr4()
load_mm_cr4() is always called with interrupts disabled from:
- switch_mm_irqs_off() - refresh_pce(), which is a on_each_cpu() callback
x86/mm: Avoid redundant interrupt disable in load_mm_cr4()
load_mm_cr4() is always called with interrupts disabled from:
- switch_mm_irqs_off() - refresh_pce(), which is a on_each_cpu() callback
Thus, disabling interrupts in cr4_set/clear_bits() is redundant.
Implement cr4_set/clear_bits_irqsoff() helpers, rename load_mm_cr4() to load_mm_cr4_irqsoff() and use the new helpers. The new helpers do not need a lockdep assert as __cr4_set() has one already.
The renaming in combination with the checks in __cr4_set() ensure that any changes in the boundary conditions at the call sites will be detected.
[ tglx: Massaged change log ]
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/0fbbcb64-5f26-4ffb-1bb9-4f5f48426893@siemens.com
show more ...
|
Revision tags: v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4, v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9 |
|
#
5a28fc94 |
| 19-Apr-2019 |
Dave Hansen <dave.hansen@linux.intel.com> |
x86/mpx, mm/core: Fix recursive munmap() corruption
This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody
x86/mpx, mm/core: Fix recursive munmap() corruption
This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody uses MPX.
MPX has the arch_unmap() hook inside of munmap() because MPX uses bounds tables that protect other areas of memory. When memory is unmapped, there is also a need to unmap the MPX bounds tables. Barring this, unused bounds tables can eat 80% of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap() wreaks havoc with __do_munmap()'s state. It can result in freeing populated page tables, accessing bogus VMA state, double-freed VMAs and more.
See the "long story" further below for the gory details.
To fix this, call arch_unmap() before __do_unmap() has a chance to do anything meaningful. Also, remove the 'vma' argument and force the MPX code to do its own, independent VMA lookup.
== UML / unicore32 impact ==
Remove unused 'vma' argument to arch_unmap(). No functional change.
I compile tested this on UML but not unicore32.
== powerpc impact ==
powerpc uses arch_unmap() well to watch for munmap() on the VDSO and zeroes out 'current->mm->context.vdso_base'. Moving arch_unmap() makes this happen earlier in __do_munmap(). But, 'vdso_base' seems to only be used in perf and in the signal delivery that happens near the return to userspace. I can not find any likely impact to powerpc, other than the zeroing happening a little earlier.
powerpc does not use the 'vma' argument and is unaffected by its removal.
I compile-tested a 64-bit powerpc defconfig.
== x86 impact ==
For the common success case this is functionally identical to what was there before. For the munmap() failure case, it's possible that some MPX tables will be zapped for memory that continues to be in use. But, this is an extraordinarily unlikely scenario and the harm would be that MPX provides no protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap(); // use ptr ret = munmap(ptr); if (ret) // oh, there was an error, I'll // keep using ptr.
Because if you're doing munmap(), you are *done* with the memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as well as the existing mpx selftests/.
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s) 2. Split the start/end one(s) if neceesary 3. Pull the VMAs out of the rbtree 4. Actually zap the memory via unmap_region(), including freeing page tables (or queueing them to be freed). 5. Fix up some of the accounting (like fput()) and actually free the VMA itself.
This specific ordering was actually introduced by:
dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
during the 4.20 merge window. The previous __do_munmap() code was actually safe because the only thing after arch_unmap() was remove_vma_list(). arch_unmap() could not see 'vma' in the rbtree because it was detached, so it is not even capable of doing operations unsafe for remove_vma_list()'s use of 'vma'.
Richard Biener reported a test that shows this in dmesg:
[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
What triggered this was the recursive do_munmap() called via arch_unmap(). It was freeing page tables that has not been properly zapped.
But, the problem was bigger than this. For one, arch_unmap() can free VMAs. But, the calling __do_munmap() has variables that *point* to VMAs and obviously can't handle them just getting freed while the pointer is still in use.
I tried a couple of things here. First, I tried to fix the page table freeing problem in isolation, but I then found the VMA issue. I also tried having the MPX code return a flag if it modified the rbtree which would force __do_munmap() to re-walk to restart. That spiralled out of control in complexity pretty fast.
Just moving arch_unmap() and accepting that the bonkers failure case might eat some bounds tables seems like the simplest viable fix.
This was also reported in the following kernel bugzilla entry:
https://bugzilla.kernel.org/show_bug.cgi?id=203123
There are some reports that this commit triggered this bug:
dd2283f2605 ("mm: mmap: zap pages with read mmap_sem in munmap")
While that commit certainly made the issues easier to hit, I believe the fundamental issue has been with us as long as MPX itself, thus the Fixes: tag below is for one of the original MPX commits.
[ mingo: Minor edits to the changelog and the patch. ]
Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
d97080eb |
| 25-Apr-2019 |
Nadav Amit <namit@vmware.com> |
x86/mm: Save debug registers when loading a temporary mm
Prevent user watchpoints from mistakenly firing while the temporary mm is being used. As the addresses of the temporary mm might overlap thos
x86/mm: Save debug registers when loading a temporary mm
Prevent user watchpoints from mistakenly firing while the temporary mm is being used. As the addresses of the temporary mm might overlap those of the user-process, this is necessary to prevent wrong signals or worse things from happening.
Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-5-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
cefa929c |
| 25-Apr-2019 |
Andy Lutomirski <luto@kernel.org> |
x86/mm: Introduce temporary mm structs
Using a dedicated page-table for temporary PTEs prevents other cores from using - even speculatively - these PTEs, thereby providing two benefits:
(1) Securit
x86/mm: Introduce temporary mm structs
Using a dedicated page-table for temporary PTEs prevents other cores from using - even speculatively - these PTEs, thereby providing two benefits:
(1) Security hardening: an attacker that gains kernel memory writing abilities cannot easily overwrite sensitive data.
(2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in remote page-tables.
To do so a temporary mm_struct can be used. Mappings which are private for this mm can be set in the userspace part of the address-space. During the whole time in which the temporary mm is loaded, interrupts must be disabled.
The first use-case for temporary mm struct, which will follow, is for poking the kernel text.
[ Commit message was written by Nadav Amit ]
Tested-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-4-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|