#
54b2fcee |
| 27-Apr-2020 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: remove last_sq_tail
The nvme driver does not have enough tags to wrap the queue, and blk-mq will no longer call commit_rqs() when there are no new submissions to notify.
Signed-off-by: Ke
nvme-pci: remove last_sq_tail
The nvme driver does not have enough tags to wrap the queue, and blk-mq will no longer call commit_rqs() when there are no new submissions to notify.
Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
74943d45 |
| 28-Apr-2020 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: remove volatile cqes
The completion queue entry is not volatile once the phase is confirmed. Remove the volatile keywords and check the phase using the appropriate READ_ONCE() accessor, al
nvme-pci: remove volatile cqes
The completion queue entry is not volatile once the phase is confirmed. Remove the volatile keywords and check the phase using the appropriate READ_ONCE() accessor, allowing the compiler to optimize the remaining completion path.
Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a8de6639 |
| 07-May-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
nvme-pci: fix "slimmer CQ head update"
Pre-incrementing ->cq_head can't be done in memory because OOB value can be observed by another context.
This devalues space savings compared to original code
nvme-pci: fix "slimmer CQ head update"
Pre-incrementing ->cq_head can't be done in memory because OOB value can be observed by another context.
This devalues space savings compared to original code :-\
$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32) Function old new delta nvme_poll_irqdisable 464 456 -8 nvme_poll 455 447 -8 nvme_irq 388 380 -8 nvme_dev_disable 955 947 -8
But the code is minimal now: one read for head, one read for q_depth, one increment, one comparison, single instruction phase bit update and one write for new head.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: John Garry <john.garry@huawei.com> Tested-by: John Garry <john.garry@huawei.com> Fixes: e2a366a4b0feaeb ("nvme-pci: slimmer CQ head update") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
726612b6 |
| 24-Mar-2020 |
Israel Rukshin <israelr@mellanox.com> |
nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
Put the ctrl reference count at nvme_uninit_ctrl as opposed to nvme_init_ctrl which takes it. This decrease the reference count at the core la
nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl
Put the ctrl reference count at nvme_uninit_ctrl as opposed to nvme_init_ctrl which takes it. This decrease the reference count at the core layer instead of decreasing it on each transport separately. Also move the call of nvme_uninit_ctrl at PCI driver after calling to nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference count after using the dev. This is safe because those functions use nvme_dev which is freed only later at nvme_pci_free_ctrl.
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
b780d741 |
| 24-Mar-2020 |
Israel Rukshin <israelr@mellanox.com> |
nvme: Fix ctrl use-after-free during sysfs deletion
In case nvme_sysfs_delete() is called by the user before taking the ctrl reference count, the ctrl may be freed during the creation and cause the
nvme: Fix ctrl use-after-free during sysfs deletion
In case nvme_sysfs_delete() is called by the user before taking the ctrl reference count, the ctrl may be freed during the creation and cause the bug. Take the reference as soon as the controller is externally visible, which is done by cdev_device_add() in nvme_init_ctrl(). Also take the reference count at the core layer instead of taking it on each transport separately.
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
253fd4ac |
| 24-Mar-2020 |
Israel Rukshin <israelr@mellanox.com> |
nvme-pci: Re-order nvme_pci_free_ctrl
Destroy the resources in the same order like in nvme_probe error flow to improve code readability.
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewe
nvme-pci: Re-order nvme_pci_free_ctrl
Destroy the resources in the same order like in nvme_probe error flow to improve code readability.
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
2db24e4a |
| 09-Mar-2020 |
Max Gurtovoy <maxg@mellanox.com> |
nvme-pci: properly print controller address
Align PCI address print with fabrics address that is printed with newline character.
Before: [root@server40 linux]# cat /sys/class/nvme/nvme2/address 000
nvme-pci: properly print controller address
Align PCI address print with fabrics address that is printed with newline character.
Before: [root@server40 linux]# cat /sys/class/nvme/nvme2/address 0000:0b:00.0[root@server40 linux]#
After: [root@server40 linux]# cat /sys/class/nvme/nvme2/address 0000:0b:00.0 [root@server40 linux]#
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
show more ...
|
#
fa059b85 |
| 04-Mar-2020 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: Simplify nvme_poll_irqdisable
The timeout handler can use the existing nvme_poll() if it needs to check a polled queue, allowing nvme_poll_irqdisable() to handle only irq driven queues for
nvme-pci: Simplify nvme_poll_irqdisable
The timeout handler can use the existing nvme_poll() if it needs to check a polled queue, allowing nvme_poll_irqdisable() to handle only irq driven queues for the remaining callers.
Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
324b494c |
| 02-Mar-2020 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: Remove two-pass completions
Completion handling had been done in two steps: find all new completions under a lock, then handle those completions outside the lock. This was done to make the
nvme-pci: Remove two-pass completions
Completion handling had been done in two steps: find all new completions under a lock, then handle those completions outside the lock. This was done to make the locked section as short as possible so that other threads using the same lock wait less time.
The driver no longer shares locks during completion, and is in fact lockless for interrupt driven queues, so the optimization no longer serves its original purpose. Replace the two-pass completion queue handler with a single pass that completes entries immediately.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
bf392a5d |
| 02-Mar-2020 |
Keith Busch <kbusch@kernel.org> |
nvme-pci: Remove tag from process cq
The only user for tagged completion was for timeout handling. That user, though, really only cares if the timed out command is completed, which we can safely che
nvme-pci: Remove tag from process cq
The only user for tagged completion was for timeout handling. That user, though, really only cares if the timed out command is completed, which we can safely check within the timeout handler.
Remove the tag check to simplify completion handling.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
e2a366a4 |
| 28-Feb-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
nvme-pci: slimmer CQ head update
Update CQ head with pre-increment operator. This saves subtraction of 1 and a few registers.
Also update phase with "^= 1". This generates only one RMW instruction.
nvme-pci: slimmer CQ head update
Update CQ head with pre-increment operator. This saves subtraction of 1 and a few registers.
Also update phase with "^= 1". This generates only one RMW instruction.
ffffffff815ba150 <nvme_update_cq_head>: ffffffff815ba150: 0f b7 47 70 movzx eax,WORD PTR [rdi+0x70] ffffffff815ba154: 83 c0 01 add eax,0x1 ffffffff815ba157: 66 89 47 70 mov WORD PTR [rdi+0x70],ax ffffffff815ba15b: 66 3b 47 68 cmp ax,WORD PTR [rdi+0x68] ffffffff815ba15f: 74 01 je ffffffff815ba162 <nvme_update_cq_head+0x12> ffffffff815ba161: c3 ret ffffffff815ba162: 31 c0 xor eax,eax ffffffff815ba164: 80 77 74 01 ===> xor BYTE PTR [rdi+0x74],0x1 ffffffff815ba168: 66 89 47 70 mov WORD PTR [rdi+0x70],ax ffffffff815ba16c: c3 ret
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-119 (-119) Function old new delta nvme_poll 690 678 -12 nvme_dev_disable 1230 1177 -53 nvme_irq 613 559 -54
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
show more ...
|
#
9515743b |
| 26-Feb-2020 |
Bijan Mottahedeh <bijan.mottahedeh@oracle.com> |
nvme-pci: Hold cq_poll_lock while completing CQEs
Completions need to consumed in the same order the controller submitted them, otherwise future completion entries may overwrite ones we haven't hand
nvme-pci: Hold cq_poll_lock while completing CQEs
Completions need to consumed in the same order the controller submitted them, otherwise future completion entries may overwrite ones we haven't handled yet. Hold the nvme queue's poll lock while completing new CQEs to prevent another thread from freeing command tags for reuse out-of-order.
Fixes: dabcefab45d3 ("nvme: provide optimized poll function for separate poll queues") Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
98f7b86a |
| 12-Feb-2020 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
nvme-pci: Use single IRQ vector for old Apple models
People reported that old Apple machines are not working properly if the non-first IRQ vector is in use.
Set quirk for that models to limit IRQ t
nvme-pci: Use single IRQ vector for old Apple models
People reported that old Apple machines are not working properly if the non-first IRQ vector is in use.
Set quirk for that models to limit IRQ to use first vector only.
Based on original patch by GitHub user npx001.
Link: https://github.com/Dunedan/mbp-2016-linux/issues/9 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Leif Liddy <leif.liddy@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
1fae37ac |
| 06-Feb-2020 |
Shyjumon N <shyjumon.n@intel.com> |
nvme/pci: Add sleep quirk for Samsung and Toshiba drives
The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo C640 platform experience runtime resume issues when the SSDs are kept
nvme/pci: Add sleep quirk for Samsung and Toshiba drives
The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo C640 platform experience runtime resume issues when the SSDs are kept in sleep/suspend mode for long time.
This patch applies the 'Simple Suspend' quirk to these configurations. With this patch, the issue had not been observed in a 1+ day test.
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shyjumon N <shyjumon.n@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
fa46c6fb |
| 12-Feb-2020 |
Keith Busch <kbusch@kernel.org> |
nvme/pci: move cqe check after device shutdown
Many users have reported nvme triggered irq_startup() warnings during shutdown. The driver uses the nvme queue's irq to synchronize scanning for comple
nvme/pci: move cqe check after device shutdown
Many users have reported nvme triggered irq_startup() warnings during shutdown. The driver uses the nvme queue's irq to synchronize scanning for completions, and enabling an interrupt affined to only offline CPUs triggers the alarming warning.
Move the final CQE check to after disabling the device and all registered interrupts have been torn down so that we do not have any IRQ to synchronize.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509 Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
cfa27356 |
| 30-Jan-2020 |
Christoph Hellwig <hch@lst.de> |
nvme-pci: remove nvmeq->tags
There is no real need to have a pointer to the tagset in struct nvme_queue, as we only need it in a single place, and that place can derive the used tagset from the devi
nvme-pci: remove nvmeq->tags
There is no real need to have a pointer to the tagset in struct nvme_queue, as we only need it in a single place, and that place can derive the used tagset from the device and qid trivially. This fixes a problem with stale pointer exposure when tagsets are reset, and also shrinks the nvme_queue structure. It also matches what most other transports have done since day 1.
Reported-by: Edmund Nadolski <edmund.nadolski@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
7e4c6b9a |
| 05-Dec-2019 |
Keith Busch <kbusch@kernel.org> |
nvme/pci: Fix read queue count
If nvme.write_queues equals the number of CPUs, the driver had decreased the number of interrupts available such that there could only be one read queue even if the co
nvme/pci: Fix read queue count
If nvme.write_queues equals the number of CPUs, the driver had decreased the number of interrupts available such that there could only be one read queue even if the controller could support more. Remove the interrupt count reduction in this case. The driver wouldn't request more IRQs than it wants queues anyway.
Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
17c33167 |
| 06-Dec-2019 |
Keith Busch <kbusch@kernel.org> |
nvme/pci Limit write queue sizes to possible cpus
The driver can never use more queues of any type than the number of possible CPUs, so a higher value causes the driver to allocate more memory for I
nvme/pci Limit write queue sizes to possible cpus
The driver can never use more queues of any type than the number of possible CPUs, so a higher value causes the driver to allocate more memory for IO queues than it could ever use. Limit the parameter at module load time to the number of possible cpus.
Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
3f68baf7 |
| 06-Dec-2019 |
Keith Busch <kbusch@kernel.org> |
nvme/pci: Fix write and poll queue types
The number of poll or write queues should never be negative. Use unsigned types so that it's not possible to break have the driver not allocate any queues.
nvme/pci: Fix write and poll queue types
The number of poll or write queues should never be negative. Use unsigned types so that it's not possible to break have the driver not allocate any queues.
Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
f6c4d97b |
| 02-Dec-2019 |
Keith Busch <kbusch@kernel.org> |
nvme/pci: Remove last_cq_head
We had been saving the last_cq_head seen from an interrupt so that a polled queue wouldn't mistakenly trigger spruious interrupt detection. We don't poll interrupt driv
nvme/pci: Remove last_cq_head
We had been saving the last_cq_head seen from an interrupt so that a polled queue wouldn't mistakenly trigger spruious interrupt detection. We don't poll interrupt driven queues any more, so saving this value is pointless.
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
c80b36cd |
| 25-Nov-2019 |
Edmund Nadolski <edmund.nadolski@intel.com> |
nvme: else following return is not needed
Remove unnecessary keyword in nvme_create_queue().
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com> S
nvme: else following return is not needed
Remove unnecessary keyword in nvme_create_queue().
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Edmund Nadolski <edmund.nadolski@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
6c6aa2f2 |
| 14-Nov-2019 |
Akinobu Mita <akinobu.mita@gmail.com> |
nvme: hwmon: add quirk to avoid changing temperature threshold
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing the value of the temperature threshold feature for specific de
nvme: hwmon: add quirk to avoid changing temperature threshold
This adds a new quirk NVME_QUIRK_NO_TEMP_THRESH_CHANGE to avoid changing the value of the temperature threshold feature for specific devices that show undesirable behavior.
Guenter reported:
"On my Intel NVME drive (SSDPEKKW512G7), writing any minimum limit on the Composite temperature sensor results in a temperature warning, and that warning is sticky until I reset the controller.
It doesn't seem to matter which temperature I write; writing -273000 has the same result."
The Intel NVMe has the latest firmware version installed, so this isn't a problem that was ever fixed.
Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jean Delvare <jdelvare@suse.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
show more ...
|
#
05d3046f |
| 24-Oct-2019 |
Geert Uytterhoeven <geert+renesas@glider.be> |
nvme-pci: Spelling s/resdicovered/rediscovered/
Fix misspelling of "rediscovered".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-
nvme-pci: Spelling s/resdicovered/rediscovered/
Fix misspelling of "rediscovered".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
16686f3a |
| 13-Oct-2019 |
Max Gurtovoy <maxg@mellanox.com> |
nvme: move common call to nvme_cleanup_cmd to core layer
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd (symmetrical functions). Move the call for nvme_cleanup_cmd to the common c
nvme: move common call to nvme_cleanup_cmd to core layer
nvme_cleanup_cmd should be called for each call to nvme_setup_cmd (symmetrical functions). Move the call for nvme_cleanup_cmd to the common core layer and call it during nvme_complete_rq for the good flow. For error flow, each transport will call nvme_cleanup_cmd independently. Also take care of a special case of path failure, where we call nvme_complete_rq without doing nvme_setup_cmd.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
58a8df67 |
| 13-Oct-2019 |
Israel Rukshin <israelr@mellanox.com> |
nvme: introduce nvme_is_aen_req function
This function improves code readability and reduces code duplication.
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg
nvme: introduce nvme_is_aen_req function
This function improves code readability and reduces code duplication.
Signed-off-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|