2bbf2b1c | 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Rework teardown/setup flow to be more common
[ Upstream commit bc90fbe0c3182157d2be100a2f6c2edbb1820677 ]
Currently the teardown/setup flow for driver probe/remove is quite a bit differen
pds_core: Rework teardown/setup flow to be more common
[ Upstream commit bc90fbe0c3182157d2be100a2f6c2edbb1820677 ]
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key piece that's missing are the calls to pci_alloc_irq_vectors() and pci_free_irq_vectors(). The pcie reset case is calling pci_free_irq_vectors() on reset_prepare, but not calling the corresponding pci_alloc_irq_vectors() on reset_done. This is causing unexpected/unwanted interrupt behavior due to the adminq interrupt being accidentally put into legacy interrupt mode. Also, the pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being called directly in probe/remove respectively.
Fix this inconsistency by making the following changes: 1. Always call pdsc_dev_init() in pdsc_setup(), which calls pci_alloc_irq_vectors() and get rid of the now unused pds_dev_reinit(). 2. Always free/clear the pdsc->intr_info in pdsc_teardown() since this structure will get re-alloced in pdsc_setup(). 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown() since pci_alloc_irq_vectors() will always be called in pdsc_setup()->pdsc_dev_init() for both the probe/remove and reset flows. 4. Make sure to only create the debugfs "identity" entry when it doesn't already exist, which it will in the reset case because it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
f6ec6ac9 | 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Clear BARs on reset
[ Upstream commit e96094c1d11cce4deb5da3c0500d49041ab845b8 ]
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix i
pds_core: Clear BARs on reset
[ Upstream commit e96094c1d11cce4deb5da3c0500d49041ab845b8 ]
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed until they are re-mapped.
Also, make sure any places that can access the BARs when they are NULL are prevented.
Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
22cd6046 | 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Prevent race issues involving the adminq
[ Upstream commit 7e82a8745b951b1e794cc780d46f3fbee5e93447 ]
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq
pds_core: Prevent race issues involving the adminq
[ Upstream commit 7e82a8745b951b1e794cc780d46f3fbee5e93447 ]
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or a fw_down/fw_up cycle due to bad PCIe state or bad device state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens after the adminq is already destroyed.
In order to fix this, add some further state checks and implement reference counting for adminq uses. Reference counting was used because multiple threads can attempt to access the adminq at the same time via [1] or [2]. Additionally, multiple clients (i.e. pds-vfio-pci) can be using [2] at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been allocated and is ready to use. Users/clients of the adminq (i.e. [1] and [2]) will increment the refcnt when they are using the adminq. When the driver goes into a fw_down cycle it will set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent any further adminq_refcnt increments. Waiting for the adminq_refcnt to hit 1 allows for any current users of the adminq to finish before the driver frees the adminq. Once the adminq_refcnt hits 1 the driver clears the refcnt to signify that the adminq is deleted and cannot be used. On the fw_up cycle the driver will once again initialize the adminq_refcnt to 1 allowing the adminq to be used again.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
699f5416 | 14-Sep-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: implement pci reset handlers
[ Upstream commit ffa55858330f267beec995fc4f68098c91311c64 ]
Implement the callbacks for a nice PCI reset. These get called when a user is nice enough to use
pds_core: implement pci reset handlers
[ Upstream commit ffa55858330f267beec995fc4f68098c91311c64 ]
Implement the callbacks for a nice PCI reset. These get called when a user is nice enough to use the sysfs PCI reset entry, e.g. echo 1 > /sys/bus/pci/devices/0000:2b:00.0/reset
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 7e82a8745b95 ("pds_core: Prevent race issues involving the adminq") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
10839a18 | 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
[ Upstream commit 951705151e50f9022bc96ec8b3fd5697380b1df6 ]
The initial design for the adminq interrupt was done based on client driv
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
[ Upstream commit 951705151e50f9022bc96ec8b3fd5697380b1df6 ]
The initial design for the adminq interrupt was done based on client drivers having their own adminq and adminq interrupt. So, each client driver's adminq isr would use their specific adminqcq for the private data struct. For the time being the design has changed to only use a single adminq for all clients. So, instead use the struct pdsc for the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit is set and the adminqcq has actually been cleared/freed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
3409e3ad | 13-Nov-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: fix up some format-truncation complaints
[ Upstream commit 7c02f6ae676a954216a192612040f9a0cde3adf7 ]
Our friendly kernel test robot pointed out a couple of potential string truncation is
pds_core: fix up some format-truncation complaints
[ Upstream commit 7c02f6ae676a954216a192612040f9a0cde3adf7 ]
Our friendly kernel test robot pointed out a couple of potential string truncation issues. None of which were we worried about, but can be relatively easily fixed to quiet the complaints.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310211736.66syyDpp-lkp@intel.com/ Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20231113183257.71110-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
0ea064e7 | 24-Aug-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: pass opcode to devcmd_wait
Don't rely on the PCI memory for the devcmd opcode because we read a 0xff value if the PCI bus is broken, which can cause us to report a bogus dev_cmd opcode lat
pds_core: pass opcode to devcmd_wait
Don't rely on the PCI memory for the devcmd opcode because we read a 0xff value if the PCI bus is broken, which can cause us to report a bogus dev_cmd opcode later.
Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-6-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
969cfd4c | 24-Aug-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: check for work queue before use
Add a check that the wq exists before queuing up work for a failed devcmd, as the PF is responsible for health and the VF doesn't have a wq.
Fixes: c2dbb09
pds_core: check for work queue before use
Add a check that the wq exists before queuing up work for a failed devcmd, as the PF is responsible for health and the VF doesn't have a wq.
Fixes: c2dbb0904310 ("pds_core: health timer and workqueue") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-5-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
95e38322 | 24-Aug-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: no reset command for VF
The VF doesn't need to send a reset command, and in a PCI reset scenario it might not have a valid IO space to write to anyway.
Fixes: 523847df1b37 ("pds_core: add
pds_core: no reset command for VF
The VF doesn't need to send a reset command, and in a PCI reset scenario it might not have a valid IO space to write to anyway.
Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-4-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
e48b894a | 24-Aug-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: no health reporter in VF
Make sure the health reporter is set up before we use it in our devlink health updates, especially since the VF doesn't set up the health reporter.
Fixes: 25b450c
pds_core: no health reporter in VF
Make sure the health reporter is set up before we use it in our devlink health updates, especially since the VF doesn't set up the health reporter.
Fixes: 25b450c05a49 ("pds_core: add devlink health facilities") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
7eb6deb3 | 21-Aug-2023 |
Jakub Kicinski <kuba@kernel.org> |
Revert "pds_core: Fix some kernel-doc comments"
This reverts commit cb39c35783f26892bb1a72b1115c94fa2e77f4c5. Patch was applied to hastily, the problem is already fixed in Alex's vfio tree: https://
Revert "pds_core: Fix some kernel-doc comments"
This reverts commit cb39c35783f26892bb1a72b1115c94fa2e77f4c5. Patch was applied to hastily, the problem is already fixed in Alex's vfio tree: https://lore.kernel.org/all/20230821112237.105872b5.alex.williamson@redhat.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|