Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12 |
|
#
0c16c83e |
| 14-Feb-2023 |
Arnd Bergmann <arnd@arndb.de> |
dax: cxl: add CXL_REGION dependency
There is already a dependency on CXL_REGION, which depends on CXL_BUS, but since CXL_REGION is a 'bool' symbol, it's possible to configure DAX as built-in even th
dax: cxl: add CXL_REGION dependency
There is already a dependency on CXL_REGION, which depends on CXL_BUS, but since CXL_REGION is a 'bool' symbol, it's possible to configure DAX as built-in even though CXL itself is a loadable module:
x86_64-linux-ld: drivers/dax/cxl.o: in function `cxl_dax_region_probe': cxl.c:(.text+0xb): undefined reference to `to_cxl_dax_region' x86_64-linux-ld: drivers/dax/cxl.o: in function `cxl_dax_region_driver_init': cxl.c:(.init.text+0x10): undefined reference to `__cxl_driver_register' x86_64-linux-ld: drivers/dax/cxl.o: in function `cxl_dax_region_driver_exit': cxl.c:(.exit.text+0x9): undefined reference to `cxl_driver_unregister'
Prevent this with another depndency on the tristate symbol.
Fixes: 09d09e04d2fc ("cxl/dax: Create dax devices for CXL RAM regions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230214103054.1082908-1-arnd@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
09d09e04 |
| 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/dax: Create dax devices for CXL RAM regions
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the r
cxl/dax: Create dax devices for CXL RAM regions
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the remainder and hot-added devices. Platform firmware is also responsible for identifying the platform general purpose memory pool, typically DDR attached DRAM, and arranging for the remainder to be 'Soft Reserved'. That reservation allows the CXL subsystem to route the memory to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated access (device-dax).
The new 'struct cxl_dax_region' object allows for a CXL memory resource (region) to be published, but also allow for udev and module policy to act on that event. It also prevents cxl_core.ko from having a module loading dependency on any drivers/dax/ modules.
Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
e9ee9fe3 |
| 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Assign RAM regions to memory-hotplug by default
The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Wh
dax: Assign RAM regions to memory-hotplug by default
The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Where is my memory?". Recall that platforms are increasingly shipping with performance-differentiated memory pools beyond typical DRAM and NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic interleave, varied media types, and future fabric attached possibilities).
For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux 'Soft Reserved') attribute is expected to be applied to all memory-pools that are not the general purpose pool. This designation gives an Operating System a chance to defer usage of a memory pool until later in the boot process where its performance properties can be interrogated and administrator policy can be applied.
'Soft Reserved' memory can be anything from too limited and precious to be part of the general purpose pool (HBM), too slow to host hot kernel data structures (some PMEM media), or anything in between. However, in the absence of an explicit policy, the memory should at least be made usable by default. The current device-dax default hides all non-general-purpose memory behind a device interface.
The expectation is that the distribution of users that want the memory online by default vs device-dedicated-access by default follows the Pareto principle. A small number of enlightened users may want to do userspace memory management through a device, but general users just want the kernel to make the memory available with an option to get more advanced later.
Arrange for all device-dax instances not backed by PMEM to default to attaching to the dax_kmem driver. From there the baseline memory hotplug policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=) gates whether the memory comes online or stays offline. Where, if it stays offline, it can be reliably converted back to device-mode where it can be partitioned, or fronted by a userspace allocator.
So, if someone wants device-dax instances for their 'Soft Reserved' memory:
1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot with memhp_default_state=offline, or roll the dice and hope that the kernel has not pinned a page in that memory before step 2.
2/ Write a udev rule to convert the target dax device(s) from 'system-ram' mode to 'devdax' mode:
daxctl reconfigure-device $dax -m devdax -f
Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
7dab174e |
| 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Move hmem device registration to dax_hmem.ko
In preparation for the CXL region driver to take over the responsibility of registering device-dax instances for CXL regions, move the registra
dax/hmem: Move hmem device registration to dax_hmem.ko
In preparation for the CXL region driver to take over the responsibility of registering device-dax instances for CXL regions, move the registration of "hmem" devices to dax_hmem.ko.
Previously the builtin component of this enabling (drivers/dax/hmem/device.o) would register platform devices for each address range and trigger the dax_hmem.ko module to load and attach device-dax instances to those devices. Now, the ranges are collected from the HMAT and EFI memory map walking, but the device creation is deferred. A new "hmem_platform" device is created which triggers dax_hmem.ko to load and register the platform devices.
Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602002771.1924368.5653558226424530127.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80 |
|
#
a870acc1 |
| 22-Nov-2022 |
Paul E. McKenney <paulmck@kernel.org> |
drivers/dax: Remove "select SRCU"
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements.
drivers/dax: Remove "select SRCU"
Now that the SRCU Kconfig option is unconditionally selected, there is no longer any point in selecting it. Therefore, remove the "select SRCU" Kconfig statements.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: <nvdimm@lists.linux.dev> Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: John Ogness <john.ogness@linutronix.de>
show more ...
|
Revision tags: v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45, v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6 |
|
#
afd586f0 |
| 29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dax: remove CONFIG_DAX_DRIVER
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: h
dax: remove CONFIG_DAX_DRIVER
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v5.15.5, v5.15.4, v5.15.3 |
|
#
83762cb5 |
| 15-Nov-2021 |
Dan Williams <dan.j.williams@intel.com> |
dax: Kill DEV_DAX_PMEM_COMPAT
The /sys/class/dax compatibility option has shipped in the kernel for 4 years now which should be sufficient time for tools to abandon the old ABI in favor of the /sys/
dax: Kill DEV_DAX_PMEM_COMPAT
The /sys/class/dax compatibility option has shipped in the kernel for 4 years now which should be sufficient time for tools to abandon the old ABI in favor of the /sys/bus/dax device-model. Delete it now and see if anyone screams.
Since this compatibility option shipped there has been more reports of users being surprised by the compat ABI than surprised by the "new", so the compat infrastructure has outlived its usefulness. Recall that /sys/bus/dax device-model is required for the dax kmem driver which allows PMEM to be used as "System RAM".
The following projects were known to have a dependency on /sys/class/dax and have dropped their dependency as of the listed version:
- ndctl (including libndctl, daxctl, and libdaxctl): v64+ - fio: v3.13+ - pmdk: v1.5.2+
As further evidence this option is no longer needed some distributions have already stopped enabling CONFIG_DEV_DAX_PMEM_COMPAT.
Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/163701116195.3784476.726128179293466337.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10 |
|
#
a927bd6b |
| 22-Nov-2020 |
Dan Williams <dan.j.williams@intel.com> |
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_t
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of:
CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y
...and:
CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y
...it failed to export the symbol in the case of:
CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n
Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too.
Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation.
The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header.
The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG.
[dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.8.17, v5.8.16, v5.8.15 |
|
#
5ccac54f |
| 13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
ACPI: HMAT: attach a device for each soft-reserved range
The hmem enabling in commit cf8741ac57ed ("ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device") only registered ranges to
ACPI: HMAT: attach a device for each soft-reserved range
The hmem enabling in commit cf8741ac57ed ("ACPI: NUMA: HMAT: Register "soft reserved" memory as an "hmem" device") only registered ranges to the hmem driver for each soft-reservation that also appeared in the HMAT. While this is meant to encourage platform firmware to "do the right thing" and publish an HMAT, the corollary is that platforms that fail to publish an accurate HMAT will strand memory from Linux usage. Additionally, the "efi_fake_mem" kernel command line option enabling will strand memory by default without an HMAT.
Arrange for "soft reserved" memory that goes unclaimed by HMAT entries to be published as raw resource ranges for the hmem driver to consume.
Include a module parameter to disable either this fallback behavior, or the hmat enabling from creating hmem devices. The module parameter requires the hmem device enabling to have unique name in the module namespace: "device_hmem".
The driver depends on the architecture providing phys_to_target_node() which is only x86 via numa_meminfo() and arm64 via a generic memblock implementation.
[joao.m.martins@oracle.com: require NUMA_KEEP_MEMINFO for phys_to_target_node()] Link: https://lkml.kernel.org/r/aaae71a7-4846-f5cc-5acf-cf05fdb1f2dc@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jia He <justin.he@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643098298.4062302.17587338161136144730.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
c01044cc |
| 13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device
In preparation for exposing "Soft Reserved" memory ranges without an HMAT, move the hmem device registration to its own compi
ACPI: HMAT: refactor hmat_register_target_device to hmem_register_device
In preparation for exposing "Soft Reserved" memory ranges without an HMAT, move the hmem device registration to its own compilation unit and make the implementation generic.
The generic implementation drops usage acpi_map_pxm_to_online_node() that was translating ACPI proximity domain values and instead relies on numa_map_to_online_node() to determine the numa node for the device.
[joao.m.martins@oracle.com: CONFIG_DEV_DAX_HMEM_DEVICES should depend on CONFIG_DAX=y] Link: https://lkml.kernel.org/r/8f34727f-ec2d-9395-cb18-969ec8a5d0d4@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643096584.4062302.5035370788475153738.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/158318761484.2216124.2049322072599482736.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
Revision tags: v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6, v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28, v5.4.27, v5.4.26, v5.4.25, v5.4.24, v5.4.23, v5.4.22, v5.4.21, v5.4.20, v5.4.19, v5.4.18, v5.4.17, v5.4.16, v5.5, v5.4.15, v5.4.14, v5.4.13, v5.4.12, v5.4.11, v5.4.10, v5.4.9, v5.4.8, v5.4.7, v5.4.6, v5.4.5, v5.4.4, v5.4.3, v5.3.15, v5.4.2, v5.4.1, v5.3.14, v5.4, v5.3.13, v5.3.12, v5.3.11, v5.3.10 |
|
#
a6c7f4c6 |
| 06-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Add a driver for "hmem" devices
Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an app
device-dax: Add a driver for "hmem" devices
Platform firmware like EFI/ACPI may publish "hmem" platform devices. Such a device is a performance differentiated memory range likely reserved for an application specific use case. The driver gives access to 100% of the capacity via a device-dax mmap instance by default.
However, if over-subscription and other kernel memory management is desired the resulting dax device can be assigned to the core-mm via the kmem driver.
This consumes "hmem" devices the producer of "hmem" devices is saved for a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM symbol to gate performing the enumeration work.
Reported-by: kbuild test robot <lkp@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
Revision tags: v5.3.9, v5.3.8, v5.3.7, v5.3.6, v5.3.5, v5.3.4, v5.3.3, v5.3.2, v5.3.1, v5.3, v5.2.14, v5.3-rc8, v5.2.13, v5.2.12, v5.2.11, v5.2.10, v5.2.9, v5.2.8, v5.2.7, v5.2.6, v5.2.5, v5.2.4, v5.2.3, v5.2.2, v5.2.1, v5.2, v5.1.16, v5.1.15, v5.1.14, v5.1.13, v5.1.12, v5.1.11, v5.1.10, v5.1.9, v5.1.8, v5.1.7, v5.1.6, v5.1.5, v5.1.4 |
|
#
ec8f24b7 |
| 19-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project
treewide: Add SPDX license identifier - Makefile/Kconfig
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v5.1.3, v5.1.2, v5.1.1, v5.0.14, v5.1, v5.0.13, v5.0.12, v5.0.11, v5.0.10, v5.0.9, v5.0.8, v5.0.7, v5.0.6 |
|
#
67476656 |
| 01-Apr-2019 |
Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> |
drivers/dax: Allow to include DEV_DAX_PMEM as builtin
This move the dependency to DEV_DAX_PMEM_COMPAT such that only if DEV_DAX_PMEM is built as module we can allow the compat support.
This allows
drivers/dax: Allow to include DEV_DAX_PMEM as builtin
This move the dependency to DEV_DAX_PMEM_COMPAT such that only if DEV_DAX_PMEM is built as module we can allow the compat support.
This allows to test the new code easily in a emulation setup where we often build things without module support.
Cc: <stable@vger.kernel.org> Fixes: 730926c3b099 ("device-dax: Add /sys/class/dax backwards compatibility") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v5.0.5, v5.0.4, v5.0.3, v4.19.29, v5.0.2, v4.19.28, v5.0.1, v4.19.27, v5.0, v4.19.26 |
|
#
c221c0b0 |
| 25-Feb-2019 |
Dave Hansen <dave.hansen@linux.intel.com> |
device-dax: "Hotplug" persistent memory for use like normal RAM
This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effect
device-dax: "Hotplug" persistent memory for use like normal RAM
This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement. Intel Optane DC persistent memory is one implementation of this kind of NVDIMM.
Currently, a persistent memory region is "owned" by a device driver, either the "Direct DAX" or "Filesystem DAX" drivers. These drivers allow applications to explicitly use persistent memory, generally by being modified to use special, new libraries. (DIMM-based persistent memory hardware/software is described in great detail here: Documentation/nvdimm/nvdimm.txt).
However, this limits persistent memory use to applications which *have* been modified. To make it more broadly usable, this driver "hotplugs" memory into the kernel, to be managed and used just like normal RAM would be.
To make this work, management software must remove the device from being controlled by the "Device DAX" infrastructure:
echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
and then tell the new driver that it can bind to the device:
echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
After this, there will be a number of new memory sections visible in sysfs that can be onlined, or that may get onlined by existing udev-initiated memory hotplug rules.
This rebinding procedure is currently a one-way trip. Once memory is bound to "kmem", it's there permanently and can not be unbound and assigned back to device_dax.
The kmem driver will never bind to a dax device unless the device is *explicitly* bound to the driver. There are two reasons for this: One, since it is a one-way trip, it can not be undone if bound incorrectly. Two, the kmem driver destroys data on the device. Think of if you had good data on a pmem device. It would be catastrophic if you compile-in "kmem", but leave out the "device_dax" driver. kmem would take over the device and write volatile data all over your good data.
This inherits any existing NUMA information for the newly-added memory from the persistent memory device that came from the firmware. On Intel platforms, the firmware has guarantees that require each socket's persistent memory to be in a separate memory-only NUMA node. That means that this patch is not expected to create NUMA nodes, but will simply hotplug memory into existing nodes.
Because NUMA nodes are created, the existing NUMA APIs and tools are sufficient to create policies for applications or memory areas to have affinity for or an aversion to using this memory.
There is currently some metadata at the beginning of pmem regions. The section-size memory hotplug restrictions, plus this small reserved area can cause the "loss" of a section or two of capacity. This should be fixable in follow-on patches. But, as a first step, losing 256MB of memory (worst case) out of hundreds of gigabytes is a good tradeoff vs. the required code to fix this up precisely. This calculation is also the reason we export memory_block_size_bytes().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: linux-nvdimm@lists.01.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: Huang Ying <ying.huang@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.19.25, v4.19.24, v4.19.23, v4.19.22, v4.19.21, v4.19.20, v4.19.19, v4.19.18, v4.19.17, v4.19.16, v4.19.15, v4.19.14, v4.19.13, v4.19.12, v4.19.11, v4.19.10, v4.19.9, v4.19.8, v4.19.7, v4.19.6, v4.19.5, v4.19.4, v4.18.20, v4.19.3, v4.18.19, v4.19.2, v4.18.18, v4.18.17, v4.19.1, v4.19, v4.18.16, v4.18.15, v4.18.14, v4.18.13, v4.18.12, v4.18.11, v4.18.10, v4.18.9, v4.18.7, v4.18.6, v4.18.5, v4.17.18, v4.18.4, v4.18.3, v4.17.17, v4.18.2, v4.17.16, v4.17.15, v4.18.1, v4.18, v4.17.14, v4.17.13, v4.17.12, v4.17.11, v4.17.10, v4.17.9, v4.17.8, v4.17.7, v4.17.6, v4.17.5, v4.17.4, v4.17.3, v4.17.2, v4.17.1, v4.17, v4.16, v4.15, v4.13.16, v4.14, v4.13.5, v4.13 |
|
#
730926c3 |
| 16-Jul-2017 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Add /sys/class/dax backwards compatibility
On the expectation that some environments may not upgrade libdaxctl (userspace component that depends on the /sys/class/dax hierarchy), provide
device-dax: Add /sys/class/dax backwards compatibility
On the expectation that some environments may not upgrade libdaxctl (userspace component that depends on the /sys/class/dax hierarchy), provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat driver implements the original /sys/class/dax sysfs layout rather than /sys/bus/dax. When userspace is upgraded it can blacklist this module and switch to the dax_pmem driver going forward.
CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according to the dax_pmem entry in Documentation/ABI/obsolete/.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
2080e88a |
| 29-Mar-2018 |
Dan Williams <dan.j.williams@intel.com> |
dax: introduce CONFIG_DAX_DRIVER
In support of allowing device-mapper to compile out idle/dead code when there are no dax providers in the system, introduce the DAX_DRIVER symbol. This is selected b
dax: introduce CONFIG_DAX_DRIVER
In support of allowing device-mapper to compile out idle/dead code when there are no dax providers in the system, introduce the DAX_DRIVER symbol. This is selected by all leaf drivers that device-mapper might be layered on top. This allows device-mapper to conditionally 'select DAX' only when a provider is present.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.12, v4.10.17, v4.10.16 |
|
#
cf1e2289 |
| 08-May-2017 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: kill NR_DEV_DAX
There is no point to ask how many device-dax instances the kernel should support. Since we are already using a dynamic major number, just allow the max number of minors b
device-dax: kill NR_DEV_DAX
There is no point to ask how many device-dax instances the kernel should support. Since we are already using a dynamic major number, just allow the max number of minors by default and be done. This also fixes the fact that the proposed max for the NR_DEV_DAX range was larger than what could be supported by alloc_chrdev_region().
Fixes: ba09c01d2fa8 ("dax: convert to the cdev api") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.10.15 |
|
#
74d71a01 |
| 05-May-2017 |
Mike Galbraith <efault@gmx.de> |
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined! ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined! ERROR: "dax_r
device-dax: Tell kbuild DEV_DAX_PMEM depends on DEV_DAX
ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined! ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined! ERROR: "dax_region_put" [drivers/dax/dax_pmem.ko] undefined!
Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.10.14, v4.10.13, v4.10.12, v4.10.11, v4.10.10 |
|
#
7b6be844 |
| 11-Apr-2017 |
Dan Williams <dan.j.williams@intel.com> |
dax: refactor dax-fs into a generic provider of 'struct dax_device' instances
We want dax capable drivers to be able to publish a set of dax operations [1]. However, we do not want to further abuse
dax: refactor dax-fs into a generic provider of 'struct dax_device' instances
We want dax capable drivers to be able to publish a set of dax operations [1]. However, we do not want to further abuse block_devices to advertise these operations. Instead we will attach these operations to a dax device and add a lookup mechanism to go from block device path to a dax device. A dax capable driver like pmem or brd is responsible for registering a dax device, alongside a block device, and then a dax capable filesystem is responsible for retrieving the dax device by path name if it wants to call dax_operations.
For now, we refactor the dax pseudo-fs to be a generic facility, rather than an implementation detail, of the device-dax use case. Where a "dax device" is just an inode + dax infrastructure, and "Device DAX" is a mapping service layered on top of that base 'struct dax_device'. "Filesystem DAX" is then a mapping service that layers a filesystem on top of that same base device. Filesystem DAX is associated with a block_device for now, but perhaps directly to a dax device in the future, or for new pmem-only filesystems.
[1]: https://lkml.org/lkml/2017/1/19/880
Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.10.9 |
|
#
956a4cd2 |
| 07-Apr-2017 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
The following warning triggers with a new unit test that stresses the device-dax interface.
=============================== [ ERR
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
The following warning triggers with a new unit test that stresses the device-dax interface.
=============================== [ ERR: suspicious RCU usage. ] 4.11.0-rc4+ #1049 Tainted: G O ------------------------------- ./include/linux/rcupdate.h:521 Illegal context switch in RCU read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0 2 locks held by fio/9070: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8d0739d7>] __do_page_fault+0x167/0x4f0 #1: (rcu_read_lock){......}, at: [<ffffffffc03fbd02>] dax_dev_huge_fault+0x32/0x620 [dax]
Call Trace: dump_stack+0x86/0xc3 lockdep_rcu_suspicious+0xd7/0x110 ___might_sleep+0xac/0x250 __might_sleep+0x4a/0x80 __alloc_pages_nodemask+0x23a/0x360 alloc_pages_current+0xa1/0x1f0 pte_alloc_one+0x17/0x80 __pte_alloc+0x1e/0x120 __get_locked_pte+0x1bf/0x1d0 insert_pfn.isra.70+0x3a/0x100 ? lookup_memtype+0xa6/0xd0 vm_insert_mixed+0x64/0x90 dax_dev_huge_fault+0x520/0x620 [dax] ? dax_dev_huge_fault+0x32/0x620 [dax] dax_dev_fault+0x10/0x20 [dax] __do_fault+0x1e/0x140 __handle_mm_fault+0x9af/0x10d0 handle_mm_fault+0x16d/0x370 ? handle_mm_fault+0x47/0x370 __do_page_fault+0x28c/0x4f0 trace_do_page_fault+0x58/0x2a0 do_async_page_fault+0x1a/0xa0 async_page_fault+0x28/0x30
Inserting a page table entry may trigger an allocation while we are holding a read lock to keep the device instance alive for the duration of the fault. Use srcu for this keep-alive protection.
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.10.8, v4.10.7, v4.10.6, v4.10.5, v4.10.4, v4.10.3, v4.10.2, v4.10.1, v4.10, v4.9, openbmc-4.4-20161121-1, v4.4.33, v4.4.32, v4.4.31, v4.4.30, v4.4.29, v4.4.28 |
|
#
867dfe34 |
| 25-Oct-2016 |
Arnd Bergmann <arnd@arndb.de> |
nvdimm: make CONFIG_NVDIMM_DAX 'bool'
A bugfix just tried to address a randconfig build problem and introduced a variant of the same problem: with CONFIG_LIBNVDIMM=y and CONFIG_NVDIMM_DAX=m, the nvd
nvdimm: make CONFIG_NVDIMM_DAX 'bool'
A bugfix just tried to address a randconfig build problem and introduced a variant of the same problem: with CONFIG_LIBNVDIMM=y and CONFIG_NVDIMM_DAX=m, the nvdimm module now fails to link:
drivers/nvdimm/built-in.o: In function `to_nd_device_type': bus.c:(.text+0x1b5d): undefined reference to `is_nd_dax' drivers/nvdimm/built-in.o: In function `nd_region_notify_driver_action.constprop.2': region_devs.c:(.text+0x6b6c): undefined reference to `is_nd_dax' region_devs.c:(.text+0x6b8c): undefined reference to `to_nd_dax' drivers/nvdimm/built-in.o: In function `nd_region_probe': region.c:(.text+0x70f3): undefined reference to `nd_dax_create' drivers/nvdimm/built-in.o: In function `mode_show': namespace_devs.c:(.text+0xa196): undefined reference to `is_nd_dax' drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe': (.text+0xa55f): undefined reference to `is_nd_dax' drivers/nvdimm/built-in.o: In function `nvdimm_namespace_common_probe': (.text+0xa56e): undefined reference to `to_nd_dax'
This reverts the earlier fix, making NVDIMM_DAX a 'bool' option again as it should be (it gets linked into the libnvdimm module). To fix the original problem, I'm adding a dependency on LIBNVDIMM to DEV_DAX_PMEM, which ensures we can't have that one built-in if the rest is a module.
Fixes: 4e65e9381c7a ("/dev/dax: fix Kconfig dependency build breakage") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.4.27, v4.7.10, openbmc-4.4-20161021-1, v4.7.9, v4.4.26, v4.7.8, v4.4.25, v4.4.24, v4.7.7, v4.8, v4.4.23, v4.7.6, v4.7.5, v4.4.22, v4.4.21, v4.7.4, v4.7.3, v4.4.20, v4.7.2, v4.4.19, openbmc-4.4-20160819-1, v4.7.1, v4.4.18, v4.4.17, openbmc-4.4-20160804-1, v4.4.16 |
|
#
ba09c01d |
| 24-Jul-2016 |
Dan Williams <dan.j.williams@intel.com> |
dax: convert to the cdev api
A goal of the device-DAX interface is to be able to support many exclusive allocations (partitions) of performance / feature differentiated memory. This count may excee
dax: convert to the cdev api
A goal of the device-DAX interface is to be able to support many exclusive allocations (partitions) of performance / feature differentiated memory. This count may exceed the default minors limit of 256.
As a result of switching to an embedded cdev the inode-to-dax_dev conversion is simplified, as well as reference counting which can switch to the cdev kobject lifetime.
Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v4.7, openbmc-4.4-20160722-1, openbmc-20160722-1, openbmc-20160713-1, v4.4.15, v4.6.4, v4.6.3, v4.4.14, v4.6.2, v4.4.13, openbmc-20160606-1, v4.6.1, v4.4.12, openbmc-20160521-1, v4.4.11, openbmc-20160518-1, v4.6 |
|
#
dee41079 |
| 14-May-2016 |
Dan Williams <dan.j.williams@intel.com> |
/dev/dax, core: file operations and dax-mmap
The "Device DAX" core enables dax mappings of performance / feature differentiated memory. An open mapping or file handle keeps the backing struct devic
/dev/dax, core: file operations and dax-mmap
The "Device DAX" core enables dax mappings of performance / feature differentiated memory. An open mapping or file handle keeps the backing struct device live, but new mappings are only possible while the device is enabled. Faults are handled under rcu_read_lock to synchronize with the enabled state of the device.
Similar to the filesystem-dax case the backing memory may optionally have struct page entries. However, unlike fs-dax there is no support for private mappings, or mappings that are not backed by media (see use of zero-page in fs-dax).
Mappings are always guaranteed to match the alignment of the dax_region. If the dax_region is configured to have a 2MB alignment, all mappings are guaranteed to be backed by a pmd entry. Contrast this determinism with the fs-dax case where pmd mappings are opportunistic. If userspace attempts to force a misaligned mapping, the driver will fail the mmap attempt. See dax_dev_check_vma() for other scenarios that are rejected, like MAP_PRIVATE mappings.
Cc: Hannes Reinecke <hare@suse.de> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
#
ab68f262 |
| 18-May-2016 |
Dan Williams <dan.j.williams@intel.com> |
/dev/dax, pmem: direct access to persistent memory
Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped without need of an
/dev/dax, pmem: direct access to persistent memory
Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows memory ranges to be allocated and mapped without need of an intervening file system. Device DAX is strict, precise and predictable. Specifically this interface:
1/ Guarantees fault granularity with respect to a given page size (pte, pmd, or pud) set at configuration time.
2/ Enforces deterministic behavior by being strict about what fault scenarios are supported.
For example, by forcing MADV_DONTFORK semantics and omitting MAP_PRIVATE support device-dax guarantees that a mapping always behaves/performs the same once established. It is the "what you see is what you get" access mechanism to differentiated memory vs filesystem DAX which has filesystem specific implementation semantics.
Persistent memory is the first target, but the mechanism is also targeted for exclusive allocations of performance differentiated memory ranges.
This commit is limited to the base device driver infrastructure to associate a dax device with pmem range.
Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
Revision tags: v5.15.2, v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12, v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14, v5.10 |
|
#
a927bd6b |
| 22-Nov-2020 |
Dan Williams <dan.j.williams@intel.com> |
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_ad
mm: fix phys_to_target_node() and memory_add_physaddr_to_nid() exports The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf863e ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|