#
3710e2b0 |
| 21-Apr-2023 |
Adrian Huang <ahuang12@lenovo.com> |
nvme-pci: clamp max_hw_sectors based on DMA optimized limitation
When running the fio test on a 448-core AMD server + a NVME disk, a soft lockup or a hard lockup call trace is shown:
[soft lockup]
nvme-pci: clamp max_hw_sectors based on DMA optimized limitation
When running the fio test on a 448-core AMD server + a NVME disk, a soft lockup or a hard lockup call trace is shown:
[soft lockup] watchdog: BUG: soft lockup - CPU#126 stuck for 23s! [swapper/126:0] RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x50 ... Call Trace: <IRQ> fq_flush_timeout+0x7d/0xd0 ? __pfx_fq_flush_timeout+0x10/0x10 call_timer_fn+0x2e/0x150 run_timer_softirq+0x48a/0x560 ? __pfx_fq_flush_timeout+0x10/0x10 ? clockevents_program_event+0xaf/0x130 __do_softirq+0xf1/0x335 irq_exit_rcu+0x9f/0xd0 sysvec_apic_timer_interrupt+0xb4/0xd0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1f/0x30 ...
Obvisouly, fq_flush_timeout spends over 20 seconds. Here is ftrace log:
| fq_flush_timeout() { | fq_ring_free() { | put_pages_list() { 0.170 us | free_unref_page_list(); 0.810 us | } | free_iova_fast() { | free_iova() { * 85622.66 us | _raw_spin_lock_irqsave(); 2.860 us | remove_iova(); 0.600 us | _raw_spin_unlock_irqrestore(); 0.470 us | lock_info_report(); 2.420 us | free_iova_mem.part.0(); * 85638.27 us | } * 85638.84 us | } | put_pages_list() { 0.230 us | free_unref_page_list(); 0.470 us | } ... ... $ 31017069 us | }
Most of cores are under lock contention for acquiring iova_rbtree_lock due to the iova flush queue mechanism.
[hard lockup] NMI watchdog: Watchdog detected hard LOCKUP on cpu 351 RIP: 0010:native_queued_spin_lock_slowpath+0x2d8/0x330
Call Trace: <IRQ> _raw_spin_lock_irqsave+0x4f/0x60 free_iova+0x27/0xd0 free_iova_fast+0x4d/0x1d0 fq_ring_free+0x9b/0x150 iommu_dma_free_iova+0xb4/0x2e0 __iommu_dma_unmap+0x10b/0x140 iommu_dma_unmap_sg+0x90/0x110 dma_unmap_sg_attrs+0x4a/0x50 nvme_unmap_data+0x5d/0x120 [nvme] nvme_pci_complete_batch+0x77/0xc0 [nvme] nvme_irq+0x2ee/0x350 [nvme] ? __pfx_nvme_pci_complete_batch+0x10/0x10 [nvme] __handle_irq_event_percpu+0x53/0x1a0 handle_irq_event_percpu+0x19/0x60 handle_irq_event+0x3d/0x60 handle_edge_irq+0xb3/0x210 __common_interrupt+0x7f/0x150 common_interrupt+0xc5/0xf0 </IRQ> <TASK> asm_common_interrupt+0x2b/0x40 ...
ftrace shows fq_ring_free spends over 10 seconds [1]. Again, most of cores are under lock contention for acquiring iova_rbtree_lock due to the iova flush queue mechanism.
[Root Cause] The root cause is that the max_hw_sectors_kb of nvme disk (mdts=10) is 4096kb, which streaming DMA mappings cannot benefit from the scalable IOVA mechanism introduced by the commit 9257b4a206fc ("iommu/iova: introduce per-cpu caching to iova allocation") if the length is greater than 128kb.
To fix the lock contention issue, clamp max_hw_sectors based on DMA optimized limitation in order to leverage scalable IOVA mechanism.
Note: The issue does not happen with another NVME disk (mdts = 5 and max_hw_sectors_kb = 128)
[1] https://gist.github.com/AdrianHuang/bf8ec7338204837631fbdaed25d19cc4
Suggested-by: Keith Busch <kbusch@kernel.org> Reported-and-tested-by: Jiwei Sun <sunjw10@lenovo.com> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
bd375fee |
| 25-Apr-2023 |
Hristo Venev <hristo@venev.name> |
nvme-pci: add quirk for missing secondary temperature thresholds
On Kingston KC3000 and Kingston FURY Renegade (both have the same PCI IDs) accessing temp3_{min,max} fails with an invalid field erro
nvme-pci: add quirk for missing secondary temperature thresholds
On Kingston KC3000 and Kingston FURY Renegade (both have the same PCI IDs) accessing temp3_{min,max} fails with an invalid field error (note that there is no problem setting the thresholds for temp1).
This contradicts the NVM Express Base Specification 2.0b, page 292:
The over temperature threshold and under temperature threshold features shall be implemented for all implemented temperature sensors (i.e., all Temperature Sensor fields that report a non-zero value).
Define NVME_QUIRK_NO_SECONDARY_TEMP_THRESH that disables the thresholds for all but the composite temperature and set it for this device.
Signed-off-by: Hristo Venev <hristo@venev.name> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
1616d6c3 |
| 03-May-2023 |
Sagi Grimberg <sagi@grimberg.me> |
nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G
Add a quirk to fix HS-SSD-FUTURE 2048G SSD drives reporting duplicate nsids.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217384 Repo
nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G
Add a quirk to fix HS-SSD-FUTURE 2048G SSD drives reporting duplicate nsids.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217384 Reported-by: Andrey God <andreygod83@protonmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
74391b3e |
| 13-Apr-2023 |
Duy Truong <dory@dory.moe> |
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting duplicate NGUIDs.
Signed-off-by: Duy Truong <dory@dory.moe> Cc: st
nvme-pci: add NVME_QUIRK_BOGUS_NID for T-FORCE Z330 SSD
Added a quirk to fix the TeamGroup T-Force Cardea Zero Z330 SSDs reporting duplicate NGUIDs.
Signed-off-by: Duy Truong <dory@dory.moe> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
1ad11eaf |
| 07-Mar-2023 |
Bjorn Helgaas <bhelgaas@google.com> |
nvme-pci: drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER
nvme-pci: drop redundant pci_enable_pcie_error_reporting()
pci_enable_pcie_error_reporting() enables the device to send ERR_* Messages. Since f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), the PCI core does this for all devices during enumeration, so the driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the driver. Also remove the corresponding pci_disable_pcie_error_reporting() from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_* Message may cause the Root Port to generate an interrupt, depending on the AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
1231363a |
| 26-Mar-2023 |
Juraj Pecigos <kernel@juraj.dev> |
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
A system with more than one of these SSDs will only have one usable. The kernel fails to detect more than one nvme device due to duplicate cntlids.
b
nvme-pci: mark Lexar NM760 as IGNORE_DEV_SUBNQN
A system with more than one of these SSDs will only have one usable. The kernel fails to detect more than one nvme device due to duplicate cntlids.
before: [ 9.395229] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 9.395262] nvme nvme0: pci function 0000:01:00.0 [ 9.395282] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 9.395305] nvme nvme1: pci function 0000:03:00.0 [ 9.409873] nvme nvme0: Duplicate cntlid 1 with nvme1, subsys nqn.2022-07.com.siliconmotion:nvm-subsystem-sn- , rejecting [ 9.409982] nvme nvme0: Removing after probe failure status: -22 [ 9.427487] nvme nvme1: allocated 64 MiB host memory buffer. [ 9.445088] nvme nvme1: 16/0/0 default/read/poll queues [ 9.449898] nvme nvme1: Ignoring bogus Namespace Identifiers
after: [ 1.161890] nvme 0000:01:00.0: platform quirk: setting simple suspend [ 1.162660] nvme nvme0: pci function 0000:01:00.0 [ 1.162684] nvme 0000:03:00.0: platform quirk: setting simple suspend [ 1.162707] nvme nvme1: pci function 0000:03:00.0 [ 1.191354] nvme nvme0: allocated 64 MiB host memory buffer. [ 1.193378] nvme nvme1: allocated 64 MiB host memory buffer. [ 1.211044] nvme nvme1: 16/0/0 default/read/poll queues [ 1.211080] nvme nvme0: 16/0/0 default/read/poll queues [ 1.216145] nvme nvme0: Ignoring bogus Namespace Identifiers [ 1.216261] nvme nvme1: Ignoring bogus Namespace Identifiers
Adding the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk to resolves the issue.
Signed-off-by: Juraj Pecigos <kernel@juraj.dev> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
b65d44fa |
| 13-Mar-2023 |
Philipp Geulen <p.geulen@js-elektronik.de> |
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
Added a quirk to fix Lexar NM620 1TB SSD reporting duplicate NGUIDs.
Signed-off-by: Philipp Geulen <p.geulen@js-elektronik.de> Reviewed-by: Chaita
nvme-pci: add NVME_QUIRK_BOGUS_NID for Lexar NM620
Added a quirk to fix Lexar NM620 1TB SSD reporting duplicate NGUIDs.
Signed-off-by: Philipp Geulen <p.geulen@js-elektronik.de> Reviewed-by: Chaitanya Kulkarni <kkch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
9630d806 |
| 08-Mar-2023 |
Elmer Miroslav Mosher Golovin <miroslav@mishamosher.com> |
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
Added a quirk to fix the Netac NV3000 SSD reporting duplicate NGUIDs.
Cc: <stable@vger.kernel.org> Signed-off-by: Elmer Miroslav Mosher Golovin <
nvme-pci: add NVME_QUIRK_BOGUS_NID for Netac NV3000
Added a quirk to fix the Netac NV3000 SSD reporting duplicate NGUIDs.
Cc: <stable@vger.kernel.org> Signed-off-by: Elmer Miroslav Mosher Golovin <miroslav@mishamosher.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
a61d2655 |
| 08-Mar-2023 |
Irvin Cote <irvincoteg@gmail.com> |
nvme-pci: fixing memory leak in probe teardown path
In case the nvme_probe teardown path is triggered the ctrl ref count does not reach 0 thus creating a memory leak upon failure of nvme_probe.
Sig
nvme-pci: fixing memory leak in probe teardown path
In case the nvme_probe teardown path is triggered the ctrl ref count does not reach 0 thus creating a memory leak upon failure of nvme_probe.
Signed-off-by: Irvin Cote <irvincoteg@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
e917a849 |
| 16-Feb-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: refresh visible attrs for cmb attributes
The sysfs group containing the cmb attributes is registered before the driver knows if they need to be visible or not. Update the group when cmb at
nvme-pci: refresh visible attrs for cmb attributes
The sysfs group containing the cmb attributes is registered before the driver knows if they need to be visible or not. Update the group when cmb attributes are known to exist so the visibility setting is correct.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217037 Fixes: 86adbf0cdb9ec65 ("nvme: simplify transport specific device attribute handling") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
b6c0c237 |
| 10-Feb-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: remove iod use_sgls
It's not used anywhere anymore, so remove it.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
8f0edf45 |
| 10-Feb-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: fix freeing single sgl
There may only be a single DMA mapped entry from multiple physical segments, which means we don't allocate a separte SGL list. Check the number of allocations prior
nvme-pci: fix freeing single sgl
There may only be a single DMA mapped entry from multiple physical segments, which means we don't allocate a separte SGL list. Check the number of allocations prior to know if we need to free something.
Freeing a single list allocation is the same for both PRP and SGL usages, so we don't need to check the use_sgl flag anymore.
Fixes: 01df742d8c5c0 ("nvme-pci: remove SGL segment descriptors") Reported-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
show more ...
|
#
dc785d69 |
| 09-Feb-2023 |
Irvin Cote <irvin.cote@insa-lyon.fr> |
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
Don't mix NULL and ERR_PTR returns.
Fixes: 2e87570be9d2 ("nvme-pci: factor out a nvme_pci_alloc_dev helper") Signed-off-by: Irvin Cote <ir
nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev
Don't mix NULL and ERR_PTR returns.
Fixes: 2e87570be9d2 ("nvme-pci: factor out a nvme_pci_alloc_dev helper") Signed-off-by: Irvin Cote <irvin.cote@insa-lyon.fr> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
924bd96e |
| 12-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: set the DMA mask earlier
Set the DMA mask before calling dma_addressing_limited, which depends on it.
Note that this stop checking the return value of dma_set_mask_and_coherent as this fu
nvme-pci: set the DMA mask earlier
Set the DMA mask before calling dma_addressing_limited, which depends on it.
Note that this stop checking the return value of dma_set_mask_and_coherent as this function can only fail for masks < 32-bit.
Fixes: 3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl") Reported-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Michael Kelley <mikelley@microsoft.com>
show more ...
|
#
5f69f009 |
| 08-Feb-2023 |
Daniel Wagner <dwagner@suse.de> |
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
Yet another device which needs a quirk:
nvme nvme1: globally duplicate IDs for nsid 1 nvme nvme1: VID:DID 10ec:5763 model:ADATA SX6000PNP firmware
nvme-pci: add bogus ID quirk for ADATA SX6000PNP
Yet another device which needs a quirk:
nvme nvme1: globally duplicate IDs for nsid 1 nvme nvme1: VID:DID 10ec:5763 model:ADATA SX6000PNP firmware:V9002s94
Link: http://bugzilla.opensuse.org/show_bug.cgi?id=1207827 Reported-by: Gustavo Freitas <freitasmgustavo@gmail.com> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
7846c1b5 |
| 05-Jan-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: place descriptor addresses in iod
The 'struct nvme_iod' space is appended at the end of the preallocated 'struct request', and padded to the cache line size. This leaves some free memory (
nvme-pci: place descriptor addresses in iod
The 'struct nvme_iod' space is appended at the end of the preallocated 'struct request', and padded to the cache line size. This leaves some free memory (in most kernel configs) up for grabs.
Instead of appending the nvme data descriptor addresses after the scatterlist, inline these for free within struct nvme_iod. There is now enough space in the mempool for 128 possibe segments.
And without increasing the size of the preallocated requests, we can hold up to 5 PRP descriptor elements, allowing the driver to increase its max transfer size to 8MB.
Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
ae582935 |
| 05-Jan-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: use mapped entries for sgl decision
The driver uses the dma entries for setting up its command's SGL/PRP lists. The dma mapping might have fewer entries than the physical segments, so chec
nvme-pci: use mapped entries for sgl decision
The driver uses the dma entries for setting up its command's SGL/PRP lists. The dma mapping might have fewer entries than the physical segments, so check the dma mapped count to determine which nvme data layout method is more optimal.
Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
01df742d |
| 05-Jan-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: remove SGL segment descriptors
The max segments this driver can see is 127, well below the 256 threshold needed to add an nvme sgl segment descriptor. Remove all the useless checks and dea
nvme-pci: remove SGL segment descriptors
The max segments this driver can see is 127, well below the 256 threshold needed to add an nvme sgl segment descriptor. Remove all the useless checks and dead code.
Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
5a5754a4 |
| 24-Jan-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: flush initial scan_work for async probe
The nvme device may have a namespace with the root partition, so make sure we've completed scanning before returning from the async probe.
Fixes: e
nvme-pci: flush initial scan_work for async probe
The nvme device may have a namespace with the root partition, so make sure we've completed scanning before returning from the async probe.
Fixes: eac3ef262941 ("nvme-pci: split the initial probe from the rest path") Reported-by: Klaus Jensen <its@irrelevant.dk> Signed-off-by: Keith Busch <kbusch@kernel.org> Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
1c584208 |
| 18-Jan-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: fix timeout request state check
Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPL
nvme-pci: fix timeout request state check
Polling the completion can progress the request state to IDLE, either inline with the completion, or through softirq. Either way, the state may not be COMPLETED, so don't check for that. We only care if the state isn't IN_FLIGHT.
This is fixing an issue where the driver aborts an IO that we just completed. Seeing the "aborting" message instead of "polled" is very misleading as to where the timeout problem resides.
Fixes: bf392a5dc02a9b ("nvme-pci: Remove tag from process cq") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
09113abf |
| 29-Dec-2022 |
Tong Zhang <ztong0001@gmail.com> |
nvme-pci: fix error handling in nvme_pci_enable()
There are two issues in nvme_pci_enable():
1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by adding a goto disable stat
nvme-pci: fix error handling in nvme_pci_enable()
There are two issues in nvme_pci_enable():
1) If pci_alloc_irq_vectors() fails, device is left enabled. Fix this by adding a goto disable statement. 2) nvme_pci_configure_admin_queue could return -ENODEV, in this case, we will need to free IRQ properly. Otherwise the following warning could be triggered:
[ 5.286752] WARNING: CPU: 0 PID: 33 at kernel/irq/irqdomain.c:253 irq_domain_remove+0x12d/0x140 [ 5.290547] Call Trace: [ 5.290626] <TASK> [ 5.290695] msi_remove_device_irq_domain+0xc9/0xf0 [ 5.290843] msi_device_data_release+0x15/0x80 [ 5.290978] release_nodes+0x58/0x90 [ 5.293788] WARNING: CPU: 0 PID: 33 at kernel/irq/msi.c:276 msi_device_data_release+0x76/0x80 [ 5.297573] Call Trace: [ 5.297651] <TASK> [ 5.297719] release_nodes+0x58/0x90 [ 5.297831] devres_release_all+0xef/0x140 [ 5.298339] device_unbind_cleanup+0x11/0xc0 [ 5.298479] really_probe+0x296/0x320
Fixes: a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable") Co-developed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Tong Zhang <ztong0001@gmail.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
453116a4 |
| 04-Jan-2023 |
Hector Martin <marcan@marcan.st> |
nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
This mirrors the quirk added to Apple Silicon controllers in apple.c. These controllers do not support the Active NS ID List comma
nvme-pci: add NVME_QUIRK_IDENTIFY_CNS quirk to Apple T2 controllers
This mirrors the quirk added to Apple Silicon controllers in apple.c. These controllers do not support the Active NS ID List command and behave identically to the SoC version judging by existing user reports/syslogs, so will need the same fix. This quirk reverts back to NVMe 1.0 behavior and disables the broken commands.
Fixes: 811f4de0344d ("nvme: avoid fallback to sequential scan due to transient issues") Signed-off-by: Hector Martin <marcan@marcan.st> Tested-by: Orlando Chamberlain <orlandoch.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
88d356ca |
| 25-Dec-2022 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: update sqsize when adjusting the queue depth
Update the core sqsize field in addition to the PCIe-specific q_depth field as the core tagset allocation helpers rely on it.
Fixes: 0da7feaa5
nvme-pci: update sqsize when adjusting the queue depth
Update the core sqsize field in addition to the PCIe-specific q_depth field as the core tagset allocation helpers rely on it.
Fixes: 0da7feaa5913 ("nvme-pci: use the tagset alloc/free helpers") Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hughd@google.com> Link: https://lore.kernel.org/r/20221225103234.226794-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
84173423 |
| 19-Dec-2022 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: fix page size checks
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE.
Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") S
nvme-pci: fix page size checks
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE.
Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
c89a529e |
| 19-Dec-2022 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: fix mempool alloc size
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries.
The result is used to determine how many PRP L
nvme-pci: fix mempool alloc size
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries.
The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool.
While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence.
Cc: Jens Axboe <axboe@kernel.dk> Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|