History log of /openbmc/linux/mm/mlock.c (Results 101 – 125 of 293)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v3.12-rc2, v3.12-rc1, v3.11, v3.11-rc7, v3.11-rc6, v3.11-rc5, v3.11-rc4, v3.11-rc3, v3.11-rc2
# 5c889690 19-Jul-2013 Paul E. McKenney <paulmck@linux.vnet.ibm.com>

mm: Place preemption point in do_mlockall() loop

There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:

>

mm: Place preemption point in do_mlockall() loop

There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel:

> My fuzz tester keeps hitting this. Every instance shows the non-irq stack
> came in from mlockall. I'm only seeing this on one box, but that has more
> ram (8gb) than my other machines, which might explain it.
>
> Dave
>
> INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
> sending NMI to all CPUs:
> NMI backtrace for cpu 3
> CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
> task: ffff88023e743fc0 ti: ffff88022f6f2000 task.ti: ffff88022f6f2000
> RIP: 0010:[<ffffffff810bf7d1>] [<ffffffff810bf7d1>] trace_hardirqs_off_caller+0x21/0xb0
> RSP: 0018:ffff880244e03c30 EFLAGS: 00000046
> RAX: ffff88023e743fc0 RBX: 0000000000000001 RCX: 000000000000003c
> RDX: 000000000000000f RSI: 0000000000000004 RDI: ffffffff81033cab
> RBP: ffff880244e03c38 R08: ffff880243288a80 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000001 R12: ffff880243288a80
> R13: ffff8802437eda40 R14: 0000000000080000 R15: 000000000000d010
> FS: 00007f50ae33b740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000097f000 CR3: 0000000240fa0000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
> ffffffff810bf86d ffff880244e03c98 ffffffff81033cab 0000000000000096
> 000000000000d008 0000000300000002 0000000000000004 0000000000000003
> 0000000000002710 ffffffff81c50d00 ffffffff81c50d00 ffff880244fcde00
> Call Trace:
> <IRQ>
> [<ffffffff810bf86d>] ? trace_hardirqs_off+0xd/0x10
> [<ffffffff81033cab>] __x2apic_send_IPI_mask+0x1ab/0x1c0
> [<ffffffff81033cdc>] x2apic_send_IPI_all+0x1c/0x20
> [<ffffffff81030115>] arch_trigger_all_cpu_backtrace+0x65/0xa0
> [<ffffffff811144b1>] rcu_check_callbacks+0x331/0x8e0
> [<ffffffff8108bfa0>] ? hrtimer_run_queues+0x20/0x180
> [<ffffffff8109e905>] ? sched_clock_cpu+0xb5/0x100
> [<ffffffff81069557>] update_process_times+0x47/0x80
> [<ffffffff810bd115>] tick_sched_handle.isra.16+0x25/0x60
> [<ffffffff810bd231>] tick_sched_timer+0x41/0x60
> [<ffffffff8108ace1>] __run_hrtimer+0x81/0x4e0
> [<ffffffff810bd1f0>] ? tick_sched_do_timer+0x60/0x60
> [<ffffffff8108b93f>] hrtimer_interrupt+0xff/0x240
> [<ffffffff8102de84>] local_apic_timer_interrupt+0x34/0x60
> [<ffffffff81718c5f>] smp_apic_timer_interrupt+0x3f/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8105f101>] ? __do_softirq+0xb1/0x440
> [<ffffffff8105f64d>] irq_exit+0xcd/0xe0
> [<ffffffff81718c65>] smp_apic_timer_interrupt+0x45/0x60
> [<ffffffff817178ef>] apic_timer_interrupt+0x6f/0x80
> <EOI>
> [<ffffffff8170e8e0>] ? retint_restore_args+0xe/0xe
> [<ffffffff8170b830>] ? wait_for_completion_killable+0x170/0x170
> [<ffffffff8170c853>] ? preempt_schedule_irq+0x53/0x90
> [<ffffffff8170e9f6>] retint_kernel+0x26/0x30
> [<ffffffff8107a523>] ? queue_work_on+0x43/0x90
> [<ffffffff8107c369>] schedule_on_each_cpu+0xc9/0x1a0
> [<ffffffff81167770>] ? lru_add_drain+0x50/0x50
> [<ffffffff811677c5>] lru_add_drain_all+0x15/0x20
> [<ffffffff81186965>] SyS_mlockall+0xa5/0x1a0
> [<ffffffff81716e94>] tracesys+0xdd/0xe2

This commit addresses this problem by inserting the required preemption
point.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 7a8010cd 11-Sep-2013 Vlastimil Babka <vbabka@suse.cz>

mm: munlock: manual pte walk in fast path instead of follow_page_mask()

Currently munlock_vma_pages_range() calls follow_page_mask() to obtain
each individual struct page. This entails repeated ful

mm: munlock: manual pte walk in fast path instead of follow_page_mask()

Currently munlock_vma_pages_range() calls follow_page_mask() to obtain
each individual struct page. This entails repeated full page table
translations and page table lock taken for each page separately.

This patch avoids the costly follow_page_mask() where possible, by
iterating over ptes within single pmd under single page table lock. The
first pte is obtained by get_locked_pte() for non-THP page acquired by the
initial follow_page_mask(). The rest of the on-stack pagevec for munlock
is filled up using pte_walk as long as pte_present() and vm_normal_page()
are sufficient to obtain the struct page.

After this patch, a 14% speedup was measured for munlocking a 56GB large
memory area with THP disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jörn Engel <joern@logfs.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 5b40998a 11-Sep-2013 Vlastimil Babka <vbabka@suse.cz>

mm: munlock: remove redundant get_page/put_page pair on the fast path

The performance of the fast path in munlock_vma_range() can be further
improved by avoiding atomic ops of a redundant get_page()

mm: munlock: remove redundant get_page/put_page pair on the fast path

The performance of the fast path in munlock_vma_range() can be further
improved by avoiding atomic ops of a redundant get_page()/put_page() pair.

When calling get_page() during page isolation, we already have the pin
from follow_page_mask(). This pin will be then returned by
__pagevec_lru_add(), after which we do not reference the pages anymore.

After this patch, an 8% speedup was measured for munlocking a 56GB large
memory area with THP disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 56afe477 11-Sep-2013 Vlastimil Babka <vbabka@suse.cz>

mm: munlock: bypass per-cpu pvec for putback_lru_page

After introducing batching by pagevecs into munlock_vma_range(), we can
further improve performance by bypassing the copying into per-cpu pageve

mm: munlock: bypass per-cpu pvec for putback_lru_page

After introducing batching by pagevecs into munlock_vma_range(), we can
further improve performance by bypassing the copying into per-cpu pagevec
and the get_page/put_page pair associated with that. Instead we perform
LRU putback directly from our pagevec. However, this is possible only for
single-mapped pages that are evictable after munlock. Unevictable pages
require rechecking after putting on the unevictable list, so for those we
fallback to putback_lru_page(), hich handles that.

After this patch, a 13% speedup was measured for munlocking a 56GB large
memory area with THP disabled.

[akpm@linux-foundation.org:clarify comment]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 1ebb7cc6 11-Sep-2013 Vlastimil Babka <vbabka@suse.cz>

mm: munlock: batch NR_MLOCK zone state updates

Depending on previous batch which introduced batched isolation in
munlock_vma_range(), we can batch also the updates of NR_MLOCK page stats.
After the

mm: munlock: batch NR_MLOCK zone state updates

Depending on previous batch which introduced batched isolation in
munlock_vma_range(), we can batch also the updates of NR_MLOCK page stats.
After the whole pagevec is processed for page isolation, the stats are
updated only once with the number of successful isolations. There were
however no measurable perfomance gains.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 7225522b 11-Sep-2013 Vlastimil Babka <vbabka@suse.cz>

mm: munlock: batch non-THP page isolation and munlock+putback using pagevec

Currently, munlock_vma_range() calls munlock_vma_page on each page in a
loop, which results in repeated taking and releasi

mm: munlock: batch non-THP page isolation and munlock+putback using pagevec

Currently, munlock_vma_range() calls munlock_vma_page on each page in a
loop, which results in repeated taking and releasing of the lru_lock
spinlock for isolating pages one by one. This patch batches the munlock
operations using an on-stack pagevec, so that isolation is done under
single lru_lock. For THP pages, the old behavior is preserved as they
might be split while putting them into the pagevec. After this patch, a
9% speedup was measured for munlocking a 56GB large memory area with THP
disabled.

A new function __munlock_pagevec() is introduced that takes a pagevec and:
1) It clears PageMlocked and isolates all pages under lru_lock. Zone page
stats can be also updated using the variant which assumes disabled
interrupts. 2) It finishes the munlock and lru putback on all pages under
their lock_page. Note that previously, lock_page covered also the
PageMlocked clearing and page isolation, but it is not needed for those
operations.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 586a32ac 11-Sep-2013 Vlastimil Babka <vbabka@suse.cz>

mm: munlock: remove unnecessary call to lru_add_drain()

In munlock_vma_range(), lru_add_drain() is currently called in a loop
before each munlock_vma_page() call.

This is suboptimal for performance

mm: munlock: remove unnecessary call to lru_add_drain()

In munlock_vma_range(), lru_add_drain() is currently called in a loop
before each munlock_vma_page() call.

This is suboptimal for performance when munlocking many pages. The
benefits of per-cpu pagevec for batching the LRU putback are removed since
the pagevec only holds at most one page from the previous loop's
iteration.

The lru_add_drain() call also does not serve any purposes for correctness
- it does not even drain pagavecs of all cpu's. The munlock code already
expects and handles situations where a page cannot be isolated from the
LRU (e.g. because it is on some per-cpu pagevec).

The history of the (not commented) call also suggest that it appears there
as an oversight rather than intentionally. Before commit ff6a6da6 ("mm:
accelerate munlock() treatment of THP pages") the call happened only once
upon entering the function. The commit has moved the call into the while
loope. So while the other changes in the commit improved munlock
performance for THP pages, it introduced the abovementioned suboptimal
per-cpu pagevec usage.

Further in history, before commit 408e82b7 ("mm: munlock use
follow_page"), munlock_vma_pages_range() was just a wrapper around
__mlock_vma_pages_range which performed both mlock and munlock depending
on a flag. However, before ba470de4 ("mmap: handle mlocked pages during
map, remap, unmap") the function handled only mlock, not munlock. The
lru_add_drain call thus comes from the implementation in commit b291f000
("mlock: mlocked pages are unevictable" and was intended only for
mlocking, not munlocking. The original intention of draining the LRU
pagevec at mlock time was to ensure the pages were on the LRU before the
lock operation so that they could be placed on the unevictable list
immediately. There is very little motivation to do the same in the
munlock path this, particularly for every single page.

This patch therefore removes the call completely. After removing the
call, a 10% speedup was measured for munlock() of a 56GB large memory area
with THP disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jörn Engel <joern@logfs.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.11-rc1, v3.10, v3.10-rc7, v3.10-rc6, v3.10-rc5, v3.10-rc4, v3.10-rc3, v3.10-rc2, v3.10-rc1, v3.9, v3.9-rc8, v3.9-rc7, v3.9-rc6, v3.9-rc5
# 09a9f1d2 28-Mar-2013 Michel Lespinasse <walken@google.com>

Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"

This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").

Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace programs"

This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to
better deal with racy userspace programs").

VM_POPULATE only has any effect when userspace plays racy games with
vmas by trying to unmap and remap memory regions that mmap or mlock are
operating on.

Also, the only effect of VM_POPULATE when userspace plays such games is
that it avoids populating new memory regions that get remapped into the
address range that was being operated on by the original mmap or mlock
calls.

Let's remove VM_POPULATE as there isn't any strong argument to mandate a
new vm_flag.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.9-rc4, v3.9-rc3, v3.9-rc2, v3.9-rc1
# ff6a6da6 27-Feb-2013 Michel Lespinasse <walken@google.com>

mm: accelerate munlock() treatment of THP pages

munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE
at a time. When munlocking THP pages (or the huge zero page), this
resulted

mm: accelerate munlock() treatment of THP pages

munlock_vma_pages_range() was always incrementing addresses by PAGE_SIZE
at a time. When munlocking THP pages (or the huge zero page), this
resulted in taking the mm->page_table_lock 512 times in a row.

We can do better by making use of the page_mask returned by
follow_page_mask (for the huge zero page case), or the size of the page
munlock_vma_page() operated on (for the true THP page case).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 28a35716 22-Feb-2013 Michel Lespinasse <walken@google.com>

mm: use long type for page counts in mm_populate() and get_user_pages()

Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:

int main

mm: use long type for page counts in mm_populate() and get_user_pages()

Use long type for page counts in mm_populate() so as to avoid integer
overflow when running the following test code:

int main(void) {
void *p = mmap(NULL, 0x100000000000, PROT_READ,
MAP_PRIVATE | MAP_ANON, -1, 0);
printf("p: %p\n", p);
mlockall(MCL_CURRENT);
printf("done\n");
return 0;
}

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 4805b02e 22-Feb-2013 Johannes Weiner <hannes@cmpxchg.org>

mm/mlock.c: document scary-looking stack expansion mlock chain

The fact that mlock calls get_user_pages, and get_user_pages might call
mlock when expanding a stack looks like a potential recursion.

mm/mlock.c: document scary-looking stack expansion mlock chain

The fact that mlock calls get_user_pages, and get_user_pages might call
mlock when expanding a stack looks like a potential recursion.

However, mlock makes sure the requested range is already contained
within a vma, so no stack expansion will actually happen from mlock.

Should this ever change: the stack expansion mlocks only the newly
expanded range and so will not result in recursive expansion.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 18693050 22-Feb-2013 Michel Lespinasse <walken@google.com>

mm: introduce VM_POPULATE flag to better deal with racy userspace programs

The vm_populate() code populates user mappings without constantly
holding the mmap_sem. This makes it susceptible to racy

mm: introduce VM_POPULATE flag to better deal with racy userspace programs

The vm_populate() code populates user mappings without constantly
holding the mmap_sem. This makes it susceptible to racy userspace
programs: the user mappings may change while vm_populate() is running,
and in this case vm_populate() may end up populating the new mapping
instead of the old one.

In order to reduce the possibility of userspace getting surprised by
this behavior, this change introduces the VM_POPULATE vma flag which
gets set on vmas we want vm_populate() to work on. This way
vm_populate() may still end up populating the new mapping after such a
race, but only if the new mapping is also one that the user has
requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# cea10a19 22-Feb-2013 Michel Lespinasse <walken@google.com>

mm: directly use __mlock_vma_pages_range() in find_extend_vma()

In find_extend_vma(), we don't need mlock_vma_pages_range() to verify
the vma type - we know we're working with a stack. So, we can c

mm: directly use __mlock_vma_pages_range() in find_extend_vma()

In find_extend_vma(), we don't need mlock_vma_pages_range() to verify
the vma type - we know we're working with a stack. So, we can call
directly into __mlock_vma_pages_range(), and remove the last
make_pages_present() call site.

Note that we don't use mm_populate() here, so we can't release the
mmap_sem while allocating new stack pages. This is deemed acceptable,
because the stack vmas grow by a bounded number of pages at a time, and
these are anon pages so we don't have to read from disk to populate
them.

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# bebeb3d6 22-Feb-2013 Michel Lespinasse <walken@google.com>

mm: introduce mm_populate() for populating new vmas

When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
with MCL_FUTURE in effect), we want to populate the pages within the
newl

mm: introduce mm_populate() for populating new vmas

When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
with MCL_FUTURE in effect), we want to populate the pages within the
newly created vmas. This may take a while as we may have to read pages
from disk, so ideally we want to do this outside of the write-locked
mmap_sem region.

This change introduces mm_populate(), which is used to defer populating
such mappings until after the mmap_sem write lock has been released.
This is implemented as a generalization of the former do_mlock_pages(),
which accomplished the same task but was using during mlock() /
mlockall().

Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.8
# 9977f0f1 12-Feb-2013 Gerald Schaefer <gerald.schaefer@de.ibm.com>

mm: don't overwrite mm->def_flags in do_mlockall()

With commit 8e72033f2a48 ("thp: make MADV_HUGEPAGE check for
mm->def_flags") the VM_NOHUGEPAGE flag may be set on s390 in
mm->def_flags for certain

mm: don't overwrite mm->def_flags in do_mlockall()

With commit 8e72033f2a48 ("thp: make MADV_HUGEPAGE check for
mm->def_flags") the VM_NOHUGEPAGE flag may be set on s390 in
mm->def_flags for certain processes, to prevent future thp mappings.
This would be overwritten by do_mlockall(), which sets it back to 0 with
an optional VM_LOCKED flag set.

To fix this, instead of overwriting mm->def_flags in do_mlockall(), only
the VM_LOCKED flag should be set or cleared.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.8-rc7, v3.8-rc6, v3.8-rc5, v3.8-rc4, v3.8-rc3, v3.8-rc2, v3.8-rc1, v3.7, v3.7-rc8, v3.7-rc7, v3.7-rc6, v3.7-rc5, v3.7-rc4, v3.7-rc3, v3.7-rc2, v3.7-rc1
# 8449d21f 08-Oct-2012 David Rientjes <rientjes@google.com>

mm, thp: fix mlock statistics

NR_MLOCK is only accounted in single page units: there's no logic to
handle transparent hugepages. This patch checks the appropriate number of
pages to adjust the stat

mm, thp: fix mlock statistics

NR_MLOCK is only accounted in single page units: there's no logic to
handle transparent hugepages. This patch checks the appropriate number of
pages to adjust the statistics by so that the correct amount of memory is
reflected.

Currently:

$ grep Mlocked /proc/meminfo
Mlocked: 19636 kB

#define MAP_SIZE (4 << 30) /* 4GB */

void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
mlock(ptr, MAP_SIZE);

$ grep Mlocked /proc/meminfo
Mlocked: 29844 kB

munlock(ptr, MAP_SIZE);

$ grep Mlocked /proc/meminfo
Mlocked: 19636 kB

And with this patch:

$ grep Mlock /proc/meminfo
Mlocked: 19636 kB

mlock(ptr, MAP_SIZE);

$ grep Mlock /proc/meminfo
Mlocked: 4213664 kB

munlock(ptr, MAP_SIZE);

$ grep Mlock /proc/meminfo
Mlocked: 19636 kB

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Hugh Dickens <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# e6c509f8 08-Oct-2012 Hugh Dickins <hughd@google.com>

mm: use clear_page_mlock() in page_remove_rmap()

We had thought that pages could no longer get freed while still marked as
mlocked; but Johannes Weiner posted this program to demonstrate that
trunca

mm: use clear_page_mlock() in page_remove_rmap()

We had thought that pages could no longer get freed while still marked as
mlocked; but Johannes Weiner posted this program to demonstrate that
truncating an mlocked private file mapping containing COWed pages is still
mishandled:

#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

int main(void)
{
char *map;
int fd;

system("grep mlockfreed /proc/vmstat");
fd = open("chigurh", O_CREAT|O_EXCL|O_RDWR);
unlink("chigurh");
ftruncate(fd, 4096);
map = mmap(NULL, 4096, PROT_WRITE, MAP_PRIVATE, fd, 0);
map[0] = 11;
mlock(map, sizeof(fd));
ftruncate(fd, 0);
close(fd);
munlock(map, sizeof(fd));
munmap(map, 4096);
system("grep mlockfreed /proc/vmstat");
return 0;
}

The anon COWed pages are not caught by truncation's clear_page_mlock() of
the pagecache pages; but unmap_mapping_range() unmaps them, so we ought to
look out for them there in page_remove_rmap(). Indeed, why should
truncation or invalidation be doing the clear_page_mlock() when removing
from pagecache? mlock is a property of mapping in userspace, not a
property of pagecache: an mlocked unmapped page is nonsensical.

Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ying Han <yinghan@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 314e51b9 08-Oct-2012 Konstantin Khlebnikov <khlebnikov@openvz.org>

mm: kill vma flag VM_RESERVED and mm->reserved_vm counter

A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

| effec

mm: kill vma flag VM_RESERVED and mm->reserved_vm counter

A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.6, v3.6-rc7, v3.6-rc6, v3.6-rc5, v3.6-rc4, v3.6-rc3, v3.6-rc2, v3.6-rc1, v3.5, v3.5-rc7, v3.5-rc6, v3.5-rc5, v3.5-rc4, v3.5-rc3, v3.5-rc2, v3.5-rc1, v3.4, v3.4-rc7, v3.4-rc6, v3.4-rc5, v3.4-rc4, v3.4-rc3, v3.4-rc2, v3.4-rc1, v3.3, v3.3-rc7
# 097d5910 06-Mar-2012 Linus Torvalds <torvalds@linux-foundation.org>

vm: avoid using find_vma_prev() unnecessarily

Several users of "find_vma_prev()" were not in fact interested in the
previous vma if there was no primary vma to be found either. And in
those cases,

vm: avoid using find_vma_prev() unnecessarily

Several users of "find_vma_prev()" were not in fact interested in the
previous vma if there was no primary vma to be found either. And in
those cases, we're much better off just using the regular "find_vma()",
and then "prev" can be looked up by just checking vma->vm_prev.

The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
commit 83cd904d271b: "mm: fix find_vma_prev"), and the whole "return
prev by reference" means that it generates worse code too.

Thus this "let's avoid using this inconvenient and clearly too subtle
interface when we don't really have to" patch.

Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.3-rc6, v3.3-rc5, v3.3-rc4, v3.3-rc3, v3.3-rc2, v3.3-rc1, v3.2, v3.2-rc7, v3.2-rc6, v3.2-rc5, v3.2-rc4, v3.2-rc3, v3.2-rc2, v3.2-rc1
# 3d470fc3 31-Oct-2011 Hugh Dickins <hughd@google.com>

mm: munlock use mapcount to avoid terrible overhead

A process spent 30 minutes exiting, just munlocking the pages of a large
anonymous area that had been alternately mprotected into page-sized vmas:

mm: munlock use mapcount to avoid terrible overhead

A process spent 30 minutes exiting, just munlocking the pages of a large
anonymous area that had been alternately mprotected into page-sized vmas:
for every single page there's an anon_vma walk through all the other
little vmas to find the right one.

A general fix to that would be a lot more complicated (use prio_tree on
anon_vma?), but there's one very simple thing we can do to speed up the
common case: if a page to be munlocked is mapped only once, then it is our
vma that it is mapped into, and there's no need whatever to walk through
all the others.

Okay, there is a very remote race in munlock_vma_pages_range(), if between
its follow_page() and lock_page(), another process were to munlock the
same page, then page reclaim remove it from our vma, then another process
mlock it again. We would find it with page_mapcount 1, yet it's still
mlocked in another process. But never mind, that's much less likely than
the down_read_trylock() failure which munlocking already tolerates (in
try_to_unmap_one()): in due course page reclaim will discover and move the
page to unevictable instead.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# df9d6985 31-Oct-2011 Christoph Lameter <cl@gentwo.org>

mm: do not drain pagevecs for mlockall(MCL_FUTURE)

MCL_FUTURE does not move pages between lru list and draining the LRU per
cpu pagevecs is a nasty activity. Avoid doing it unecessarily.

Signed-of

mm: do not drain pagevecs for mlockall(MCL_FUTURE)

MCL_FUTURE does not move pages between lru list and draining the LRU per
cpu pagevecs is a nasty activity. Avoid doing it unecessarily.

Signed-off-by: Christoph Lameter <cl@gentwo.org>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v3.1, v3.1-rc10
# b95f1b31 16-Oct-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

mm: Map most files to use export.h instead of module.h

The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need

mm: Map most files to use export.h instead of module.h

The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

show more ...


Revision tags: v3.1-rc9, v3.1-rc8, v3.1-rc7, v3.1-rc6, v3.1-rc5, v3.1-rc4, v3.1-rc3, v3.1-rc2, v3.1-rc1, v3.0, v3.0-rc7, v3.0-rc6, v3.0-rc5, v3.0-rc4, v3.0-rc3, v3.0-rc2, v3.0-rc1
# ca16d140 26-May-2011 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

mm: don't access vm_flags as 'int'

The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.f

mm: don't access vm_flags as 'int'

The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
[ Changed to use a typedef - we'll extend it to cover more cases
later, since there has been discussion about making it a 64-bit
type.. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.39, v2.6.39-rc7
# a1fde08c 04-May-2011 Linus Torvalds <torvalds@linux-foundation.org>

VM: skip the stack guard page lookup in get_user_pages only for mlock

The logic in __get_user_pages() used to skip the stack guard page lookup
whenever the caller wasn't interested in seeing what th

VM: skip the stack guard page lookup in get_user_pages only for mlock

The logic in __get_user_pages() used to skip the stack guard page lookup
whenever the caller wasn't interested in seeing what the actual page
was. But Michel Lespinasse points out that there are cases where we
don't care about the physical page itself (so 'pages' may be NULL), but
do want to make sure a page is mapped into the virtual address space.

So using the existence of the "pages" array as an indication of whether
to look up the guard page or not isn't actually so great, and we really
should just use the FOLL_MLOCK bit. But because that bit was only set
for the VM_LOCKED case (and not all vma's necessarily have it, even for
mlock()), we couldn't do that originally.

Fix that by moving the VM_LOCKED check deeper into the call-chain, which
actually simplifies many things. Now mlock() gets simpler, and we can
also check for FOLL_MLOCK in __get_user_pages() and the code ends up
much more straightforward.

Reported-and-reviewed-by: Michel Lespinasse <walken@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v2.6.39-rc6, v2.6.39-rc5, v2.6.39-rc4
# 95042f9e 12-Apr-2011 Linus Torvalds <torvalds@linux-foundation.org>

vm: fix mlock() on stack guard page

Commit 53a7706d5ed8 ("mlock: do not hold mmap_sem for extended periods
of time") changed mlock() to care about the exact number of pages that
__get_user_pages() h

vm: fix mlock() on stack guard page

Commit 53a7706d5ed8 ("mlock: do not hold mmap_sem for extended periods
of time") changed mlock() to care about the exact number of pages that
__get_user_pages() had brought it. Before, it would only care about
errors.

And that doesn't work, because we also handled one page specially in
__mlock_vma_pages_range(), namely the stack guard page. So when that
case was handled, the number of pages that the function returned was off
by one. In particular, it could be zero, and then the caller would end
up not making any progress at all.

Rather than try to fix up that off-by-one error for the mlock case
specially, this just moves the logic to handle the stack guard page
into__get_user_pages() itself, thus making all the counts come out
right automatically.

Reported-by: Robert Święcki <robert@swiecki.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


12345678910>>...12