History log of /openbmc/linux/mm/khugepaged.c (Results 101 – 125 of 993)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# edb5d0cf 25-Jan-2023 Zach O'Keefe <zokeefe@google.com>

mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups

In commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():

-

mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups

In commit 34488399fa08 ("mm/madvise: add file and shmem support to
MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none():

- if (!pmd_present(pmde))
- return SCAN_PMD_NULL;
+ if (pmd_none(pmde))
+ return SCAN_PMD_NONE;

This was for-use by MADV_COLLAPSE file/shmem codepaths, where
MADV_COLLAPSE might identify a pte-mapped hugepage, only to have
khugepaged race-in, free the pte table, and clear the pmd. Such codepaths
include:

A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER
already in the pagecache.
B) In retract_page_tables(), if we fail to grab mmap_lock for the target
mm/address.

In these cases, collapse_pte_mapped_thp() really does expect a none (not
just !present) pmd, and we want to suitably identify that case separate
from the case where no pmd is found, or it's a bad-pmd (of course, many
things could happen once we drop mmap_lock, and the pmd could plausibly
undergo multiple transitions due to intervening fault, split, etc).
Regardless, the code is prepared install a huge-pmd only when the existing
pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd.

However, the commit introduces a logical hole; namely, that we've allowed
!none- && !huge- && !bad-pmds to be classified as genuine
pte-table-mapping-pmds. One such example that could leak through are swap
entries. The pmd values aren't checked again before use in
pte_offset_map_lock(), which is expecting nothing less than a genuine
pte-table-mapping-pmd.

We want to put back the !pmd_present() check (below the pmd_none() check),
but need to be careful to deal with subtleties in pmd transitions and
treatments by various arch.

The issue is that __split_huge_pmd_locked() temporarily clears the present
bit (or otherwise marks the entry as invalid), but pmd_present() and
pmd_trans_huge() still need to return true while the pmd is in this
transitory state. For example, x86's pmd_present() also checks the
_PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also
checks a PMD_PRESENT_INVALID bit.

Covering all 4 cases for x86 (all checks done on the same pmd value):

1) pmd_present() && pmd_trans_huge()
All we actually know here is that the PSE bit is set. Either:
a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE
is set.
=> huge-pmd
b) We are currently racing with __split_huge_page(). The danger here
is that we proceed as-if we have a huge-pmd, but really we are
looking at a pte-mapping-pmd. So, what is the risk of this
danger?

The only relevant path is:

madvise_collapse() -> collapse_pte_mapped_thp()

Where we might just incorrectly report back "success", when really
the memory isn't pmd-backed. This is fine, since split could
happen immediately after (actually) successful madvise_collapse().
So, it should be safe to just assume huge-pmd here.

2) pmd_present() && !pmd_trans_huge()
Either:
a) PSE not set and either PRESENT or PROTNONE is.
=> pte-table-mapping pmd (or PROT_NONE)
b) devmap. This routine can be called immediately after
unlocking/locking mmap_lock -- or called with no locks held (see
khugepaged_scan_mm_slot()), so previous VMA checks have since been
invalidated.

3) !pmd_present() && pmd_trans_huge()
Not possible.

4) !pmd_present() && !pmd_trans_huge()
Neither PRESENT nor PROTNONE set
=> not present

I've checked all archs that implement pmd_trans_huge() (arm64, riscv,
powerpc, longarch, x86, mips, s390) and this logic roughly translates
(though devmap treatment is unique to x86 and powerpc, and (3) doesn't
necessarily hold in general -- but that doesn't matter since
!pmd_present() always takes failure path).

Also, add a comment above find_pmd_or_thp_or_none() to help future
travelers reason about the validity of the code; namely, the possible
mutations that might happen out from under us, depending on how mmap_lock
is held (if at all).

Link: https://lkml.kernel.org/r/20230125225358.2576151-1-zokeefe@google.com
Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 023f47a8 11-Jan-2023 Jann Horn <jannh@google.com>

mm/khugepaged: fix ->anon_vma race

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.

Page table traversal is allowed under any one of the mmap lock, the
ano

mm/khugepaged: fix ->anon_vma race

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked.

Page table traversal is allowed under any one of the mmap lock, the
anon_vma lock (if the VMA is associated with an anon_vma), and the
mapping lock (if the VMA is associated with a mapping); and so to be
able to remove page tables, we must hold all three of them.
retract_page_tables() bails out if an ->anon_vma is attached, but does
this check before holding the mmap lock (as the comment above the check
explains).

If we racily merged an existing ->anon_vma (shared with a child
process) from a neighboring VMA, subsequent rmap traversals on pages
belonging to the child will be able to see the page tables that we are
concurrently removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that
there really is no concurrent page table access.

Hitting this bug causes a lockdep warning in collapse_and_free_pmd(),
in the line "lockdep_assert_held_write(&vma->anon_vma->root->rwsem)".
It can also lead to use-after-free access.

Link: https://lore.kernel.org/linux-mm/CAG48ez3434wZBKFFbdx4M9j6eUwSUVPd4dxhzW_k_POneSDF+A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20230111133351.807024-1-jannh@google.com
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Jann Horn <jannh@google.com>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@intel.linux.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 57a30218 31-Jan-2023 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# df5bf3b9 31-Jan-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.2-rc6.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# aebd8f0c 31-Jan-2023 Daniel Vetter <daniel.vetter@ffwll.ch>

Merge v6.2-rc6 into drm-next

Due to holidays we started -next with more -fixes in-flight than
usual, and people have been asking where they are. Backmerge to get
things better in sync.

Conflicts:
-

Merge v6.2-rc6 into drm-next

Due to holidays we started -next with more -fixes in-flight than
usual, and people have been asking where they are. Backmerge to get
things better in sync.

Conflicts:
- Tiny conflict in drm_fbdev_generic.c between variable rename and
missing error handling that got added.
- Conflict in drm_fb_helper.c between the added call to vgaswitcheroo
in drm_fb_helper_single_fb_probe and a refactor patch that extracted
lots of helpers and incidentally removed the dev local variable.
Readd it to make things compile.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

show more ...


# 6397859c 26-Jan-2023 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.2-rc5' into locking/core, to pick up fixes

Refresh this branch with the latest locking fixes, before
applying a new series of changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e3e9fc7f 23-Jan-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.2-rc5 into usb-next

We need the USB fixes in here and this resolves merge conflicts as
reported in linux-next in the following files:
drivers/usb/host/xhci.c
drivers/usb/host/xhci.h
drive

Merge 6.2-rc5 into usb-next

We need the USB fixes in here and this resolves merge conflicts as
reported in linux-next in the following files:
drivers/usb/host/xhci.c
drivers/usb/host/xhci.h
drivers/usb/typec/ucsi/ucsi.c

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# 54573f3e 23-Jan-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.2-rc5 into staging-next

We need the staging fixes in here, and it resolves a merge conflict in:
drivers/staging/vc04_services/include/linux/raspberrypi/vchiq.h

Signed-off-by: Greg Kroah-Ha

Merge 6.2-rc5 into staging-next

We need the staging fixes in here, and it resolves a merge conflict in:
drivers/staging/vc04_services/include/linux/raspberrypi/vchiq.h

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

show more ...


# f89fd043 22-Jan-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.2-rc5 into driver-core-next

We need the driver core fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7a6aa989 22-Jan-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.2-rc5 into tty-next

We need the serial/tty changes into this branch as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 99ba2ad1 22-Jan-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.2-rc5 into char-misc-next

We need the char/misc driver fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b3c588cd 20-Jan-2023 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend")
8e4

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend")
8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()")
d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration")
https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

show more ...


# 6f849817 19-Jan-2023 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging into drm-misc-next to get DRM accelerator infrastructure,
which is required by ipuv driver.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 1301f931 18-Jan-2023 Andrew Morton <akpm@linux-foundation.org>

Pull mm-hotfixes-stable dependencies into mm-stable.

Merge branch 'mm-hotfixes-stable' into mm-stable


# d0e99511 17-Jan-2023 Kalle Valo <kvalo@kernel.org>

Merge wireless into wireless-next

Due to the two cherry picked commits from wireless to wireless-next we have
several conflicts in mt76. To avoid any bugs with conflicts merge wireless into
wireless

Merge wireless into wireless-next

Due to the two cherry picked commits from wireless to wireless-next we have
several conflicts in mt76. To avoid any bugs with conflicts merge wireless into
wireless-next.

96f134dc1964 wifi: mt76: handle possible mt76_rx_token_consume failures
fe13dad8992b wifi: mt76: dma: do not increment queue head if mt76_dma_add_buf fails

show more ...


# 6e50979a 16-Jan-2023 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
"21 hotfixes. Thirteen of these address pre-6.1 issue

Merge tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc hotfixes from Andrew Morton:
"21 hotfixes. Thirteen of these address pre-6.1 issues and hence have
the cc:stable tag"

* tag 'mm-hotfixes-stable-2023-01-16-15-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
init/Kconfig: fix typo (usafe -> unsafe)
nommu: fix split_vma() map_count error
nommu: fix do_munmap() error path
nommu: fix memory leak in do_mmap() error path
MAINTAINERS: update Robert Foss' email address
proc: fix PIE proc-empty-vm, proc-pid-vm tests
mm: update mmap_sem comments to refer to mmap_lock
include/linux/mm: fix release_pages_arg kernel doc comment
lib/win_minmax: use /* notation for regular comments
kasan: mark kasan_kunit_executing as static
nilfs2: fix general protection fault in nilfs_btree_insert()
Docs/admin-guide/mm/zswap: remove zsmalloc's lack of writeback warning
mm/hugetlb: pre-allocate pgtable pages for uffd wr-protects
hugetlb: unshare some PMDs when splitting VMAs
mm: fix vma->anon_name memory leak for anonymous shmem VMAs
mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE
mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end
mm/userfaultfd: enable writenotify while userfaultfd-wp is enabled for a VMA
mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma
mm/hugetlb: fix uffd-wp handling for migration entries in hugetlb_change_protection()
...

show more ...


Revision tags: v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16
# 52dc0310 24-Dec-2022 Zach O'Keefe <zokeefe@google.com>

mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end

MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained with

mm/MADV_COLLAPSE: don't expand collapse when vm_end is past requested end

MADV_COLLAPSE acts on one hugepage-aligned/sized region at a time, until
it has collapsed all eligible memory contained within the bounds supplied
by the user.

At the top of each hugepage iteration we (re)lock mmap_lock and
(re)validate the VMA for eligibility and update variables that might have
changed while mmap_lock was dropped. One thing that might occur is that
the VMA could be resized, and as such, we refetch vma->vm_end to make sure
we don't collapse past the end of the VMA's new end.

However, it's possible that when refetching vma->vm_end that we expand the
region acted on by MADV_COLLAPSE if vma->vm_end is greater than size+len
supplied by the user.

The consequence here is that we may attempt to collapse more memory than
requested, possibly yielding either "too much success" or "false failure"
user-visible results. An example of the former is if we MADV_COLLAPSE the
first 4MiB of a 2TiB mmap()'d file, the incorrect refetch would cause the
operation to block for much longer than anticipated as we attempt to
collapse the entire TiB region. An example of the latter is that applying
MADV_COLLPSE to a 4MiB file mapped to the start of a 6MiB VMA will
successfully collapse the first 4MiB, then incorrectly attempt to collapse
the last hugepage-aligned/sized region -- fail (since readahead/page cache
lookup will fail) -- and report a failure to the user.

I don't believe there is a kernel stability concern here as we always
(re)validate the VMA / region accordingly. Also as Hugh mentions, the
user-visible effects are: we try to collapse more memory than requested
by the user, and/or failing an operation that should have otherwise
succeeded. An example is trying to collapse a 4MiB file contained
within a 12MiB VMA.

Don't expand the acted-on region when refetching vma->vm_end.

Link: https://lkml.kernel.org/r/20221224082035.3197140-1-zokeefe@google.com
Fixes: 4d24de9425f7 ("mm: MADV_COLLAPSE: refetch vm_end after reacquiring mmap_lock")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# ab0c3f12 22-Dec-2022 Hugh Dickins <hughd@google.com>

mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma

uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma

mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma

uprobe_write_opcode() uses collapse_pte_mapped_thp() to restore huge pmd,
when removing a breakpoint from hugepage text: vma->anon_vma is always set
in that case, so undo the prohibition. And MADV_COLLAPSE ought to be able
to collapse some page tables in a vma which happens to have anon_vma set
from CoWing elsewhere.

Is anon_vma lock required? Almost not: if any page other than expected
subpage of the non-anon huge page is found in the page table, collapse is
aborted without making any change. However, it is possible that an anon
page was CoWed from this extent in another mm or vma, in which case a
concurrent lookup might look here: so keep it away while clearing pmd (but
perhaps we shall go back to using pmd_lock() there in future).

Note that collapse_pte_mapped_thp() is exceptional in freeing a page table
without having cleared its ptes: I'm uneasy about that, and had thought
pte_clear()ing appropriate; but exclusive i_mmap lock does fix the
problem, and we would have to move the mmu_notification if clearing those
ptes.

What this fixes is not a dangerous instability. But I suggest Cc stable
because uprobes "healing" has regressed in that way, so this should follow
8d3c106e19e8 into those stable releases where it was backported (and may
want adjustment there - I'll supply backports as needed).

Link: https://lkml.kernel.org/r/b740c9fb-edba-92ba-59fb-7a5592e5dfc@google.com
Fixes: 8d3c106e19e8 ("mm/khugepaged: take the right locks for page table retraction")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 407da561 09-Jan-2023 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.2-rc3' into next

Merge with mainline to bring in timer_shutdown_sync() API.


# 2c55d703 03-Jan-2023 Maxime Ripard <maxime@cerno.tech>

Merge drm/drm-fixes into drm-misc-fixes

Let's start the fixes cycle.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>


# 0d8eae7b 02-Jan-2023 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync up with v6.2-rc1.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# b501d4dc 30-Dec-2022 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-gt-next

Sync after v6.2-rc1 landed in drm-next.

We need to get some dependencies in place before we can merge
the fixes series from Gwan-gyeong and Chris.

Referen

Merge drm/drm-next into drm-intel-gt-next

Sync after v6.2-rc1 landed in drm-next.

We need to get some dependencies in place before we can merge
the fixes series from Gwan-gyeong and Chris.

References: https://lore.kernel.org/all/Y6x5JCDnh2rvh4lA@intel.com/
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...


# 6599e683 28-Dec-2022 Mauro Carvalho Chehab <mchehab@kernel.org>

Merge tag 'v6.2-rc1' into media_tree

Linux 6.2-rc1

* tag 'v6.2-rc1': (14398 commits)
Linux 6.2-rc1
treewide: Convert del_timer*() to timer_shutdown*()
pstore: Properly assign mem_type propert

Merge tag 'v6.2-rc1' into media_tree

Linux 6.2-rc1

* tag 'v6.2-rc1': (14398 commits)
Linux 6.2-rc1
treewide: Convert del_timer*() to timer_shutdown*()
pstore: Properly assign mem_type property
pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXES
cfi: Fix CFI failure with KASAN
perf python: Fix splitting CC into compiler and options
afs: Stop implementing ->writepage()
afs: remove afs_cache_netfs and afs_zap_permits() declarations
afs: remove variable nr_servers
afs: Fix lost servers_outstanding count
ALSA: usb-audio: Add new quirk FIXED_RATE for JBL Quantum810 Wireless
ALSA: azt3328: Remove the unused function snd_azf3328_codec_outl()
gcov: add support for checksum field
test_maple_tree: add test for mas_spanning_rebalance() on insufficient data
maple_tree: fix mas_spanning_rebalance() on insufficient data
hugetlb: really allocate vma lock for all sharable vmas
kmsan: export kmsan_handle_urb
kmsan: include linux/vmalloc.h
mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
mm, mremap: fix mremap() expanding vma with addr inside vma
...

show more ...


# c183e6c3 21-Dec-2022 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


Revision tags: v6.1.1, v6.0.15
# a9763aa0 20-Dec-2022 Andrew Morton <akpm@linux-foundation.org>

Merge branch 'master' into mm-nonmm-stable


12345678910>>...40