Revision tags: v6.6.25, v6.6.24, v6.6.23, v6.6.16, v6.6.15 |
|
#
2bbf2b1c |
| 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Rework teardown/setup flow to be more common
[ Upstream commit bc90fbe0c3182157d2be100a2f6c2edbb1820677 ]
Currently the teardown/setup flow for driver probe/remove is quite a bit differen
pds_core: Rework teardown/setup flow to be more common
[ Upstream commit bc90fbe0c3182157d2be100a2f6c2edbb1820677 ]
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key piece that's missing are the calls to pci_alloc_irq_vectors() and pci_free_irq_vectors(). The pcie reset case is calling pci_free_irq_vectors() on reset_prepare, but not calling the corresponding pci_alloc_irq_vectors() on reset_done. This is causing unexpected/unwanted interrupt behavior due to the adminq interrupt being accidentally put into legacy interrupt mode. Also, the pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being called directly in probe/remove respectively.
Fix this inconsistency by making the following changes: 1. Always call pdsc_dev_init() in pdsc_setup(), which calls pci_alloc_irq_vectors() and get rid of the now unused pds_dev_reinit(). 2. Always free/clear the pdsc->intr_info in pdsc_teardown() since this structure will get re-alloced in pdsc_setup(). 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown() since pci_alloc_irq_vectors() will always be called in pdsc_setup()->pdsc_dev_init() for both the probe/remove and reset flows. 4. Make sure to only create the debugfs "identity" entry when it doesn't already exist, which it will in the reset case because it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
22cd6046 |
| 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Prevent race issues involving the adminq
[ Upstream commit 7e82a8745b951b1e794cc780d46f3fbee5e93447 ]
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq
pds_core: Prevent race issues involving the adminq
[ Upstream commit 7e82a8745b951b1e794cc780d46f3fbee5e93447 ]
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or a fw_down/fw_up cycle due to bad PCIe state or bad device state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens after the adminq is already destroyed.
In order to fix this, add some further state checks and implement reference counting for adminq uses. Reference counting was used because multiple threads can attempt to access the adminq at the same time via [1] or [2]. Additionally, multiple clients (i.e. pds-vfio-pci) can be using [2] at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been allocated and is ready to use. Users/clients of the adminq (i.e. [1] and [2]) will increment the refcnt when they are using the adminq. When the driver goes into a fw_down cycle it will set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent any further adminq_refcnt increments. Waiting for the adminq_refcnt to hit 1 allows for any current users of the adminq to finish before the driver frees the adminq. Once the adminq_refcnt hits 1 the driver clears the refcnt to signify that the adminq is deleted and cannot be used. On the fw_up cycle the driver will once again initialize the adminq_refcnt to 1 allowing the adminq to be used again.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4 |
|
#
699f5416 |
| 14-Sep-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: implement pci reset handlers
[ Upstream commit ffa55858330f267beec995fc4f68098c91311c64 ]
Implement the callbacks for a nice PCI reset. These get called when a user is nice enough to use
pds_core: implement pci reset handlers
[ Upstream commit ffa55858330f267beec995fc4f68098c91311c64 ]
Implement the callbacks for a nice PCI reset. These get called when a user is nice enough to use the sysfs PCI reset entry, e.g. echo 1 > /sys/bus/pci/devices/0000:2b:00.0/reset
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: 7e82a8745b95 ("pds_core: Prevent race issues involving the adminq") Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
10839a18 |
| 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
[ Upstream commit 951705151e50f9022bc96ec8b3fd5697380b1df6 ]
The initial design for the adminq interrupt was done based on client driv
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
[ Upstream commit 951705151e50f9022bc96ec8b3fd5697380b1df6 ]
The initial design for the adminq interrupt was done based on client drivers having their own adminq and adminq interrupt. So, each client driver's adminq isr would use their specific adminqcq for the private data struct. For the time being the design has changed to only use a single adminq for all clients. So, instead use the struct pdsc for the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit is set and the adminqcq has actually been cleared/freed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
b2662814 |
| 29-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Cancel AQ work on teardown
[ Upstream commit d321067e2cfa4d5e45401a00912ca9da8d1af631 ]
There is a small window where pdsc_work_thread() calls pdsc_process_adminq() and pdsc_process_admin
pds_core: Cancel AQ work on teardown
[ Upstream commit d321067e2cfa4d5e45401a00912ca9da8d1af631 ]
There is a small window where pdsc_work_thread() calls pdsc_process_adminq() and pdsc_process_adminq() passes the PDSC_S_STOPPING_DRIVER check and starts to process adminq/notifyq work and then the driver starts a fw_down cycle. This could cause some undefined behavior if the notifyqcq/adminqcq are free'd while pdsc_process_adminq() is running. Use cancel_work_sync() on the adminqcq's work struct to make sure any pending work items are cancelled and any in progress work items are completed.
Also, make sure to not call cancel_work_sync() if the work item has not be initialized. Without this, traces will happen in cases where a reset fails and teardown is called again or if reset fails and the driver is removed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48 |
|
#
95e38322 |
| 24-Aug-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: no reset command for VF
The VF doesn't need to send a reset command, and in a PCI reset scenario it might not have a valid IO space to write to anyway.
Fixes: 523847df1b37 ("pds_core: add
pds_core: no reset command for VF
The VF doesn't need to send a reset command, and in a PCI reset scenario it might not have a valid IO space to write to anyway.
Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-4-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e48b894a |
| 24-Aug-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: no health reporter in VF
Make sure the health reporter is set up before we use it in our devlink health updates, especially since the VF doesn't set up the health reporter.
Fixes: 25b450c
pds_core: no health reporter in VF
Make sure the health reporter is set up before we use it in our devlink health updates, especially since the VF doesn't set up the health reporter.
Fixes: 25b450c05a49 ("pds_core: add devlink health facilities") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230824161754.34264-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36 |
|
#
906a76cc |
| 27-Jun-2023 |
Julia Lawall <Julia.Lawall@inria.fr> |
pds_core: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against multiplication overflows.
The changes were done using the following Coccinelle semantic patch:
// <smpl> @i
pds_core: use vmalloc_array and vcalloc
Use vmalloc_array and vcalloc to protect against multiplication overflows.
The changes were done using the following Coccinelle semantic patch:
// <smpl> @initialize:ocaml@ @@
let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown"
@@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@
( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-10-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25 |
|
#
d24c2827 |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: publish events to the clients
When the Core device gets an event from the device, or notices the device FW to be up or down, it needs to send those events on to the clients that have an ev
pds_core: publish events to the clients
When the Core device gets an event from the device, or notices the device FW to be up or down, it needs to send those events on to the clients that have an event handler. Add the code to pass along the events to the clients.
The entry points pdsc_register_notify() and pdsc_unregister_notify() are EXPORTed for other drivers that want to listen for these events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
65e0185a |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: set up the VIF definitions and defaults
The Virtual Interfaces (VIFs) supported by the DSC's configuration (vDPA, Eth, RDMA, etc) are reported in the dev_ident struct and made visible in d
pds_core: set up the VIF definitions and defaults
The Virtual Interfaces (VIFs) supported by the DSC's configuration (vDPA, Eth, RDMA, etc) are reported in the dev_ident struct and made visible in debugfs. At this point only vDPA is supported in this driver so we only setup devices for that feature.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
01ba61b5 |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: Add adminq processing and commands
Add the service routines for submitting and processing the adminq messages and for handling notifyq events.
Signed-off-by: Shannon Nelson <shannon.nelso
pds_core: Add adminq processing and commands
Add the service routines for submitting and processing the adminq messages and for handling notifyq events.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
45d76f49 |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: set up device and adminq
Set up the basic adminq and notifyq queue structures. These are used mostly by the client drivers for feature configuration. These are essentially the same adminq
pds_core: set up device and adminq
Set up the basic adminq and notifyq queue structures. These are used mostly by the client drivers for feature configuration. These are essentially the same adminq and notifyq as in the ionic driver.
Part of this includes querying for device identity and FW information, so we can make that available to devlink dev info.
$ devlink dev info pci/0000:b5:00.0 pci/0000:b5:00.0: driver pds_core serial_number FLM18420073 versions: fixed: asic.id 0x0 asic.rev 0x0 running: fw 1.51.0-73 stored: fw.goldfw 1.15.9-C-22 fw.mainfwa 1.60.0-73 fw.mainfwb 1.60.0-57
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
25b450c0 |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: add devlink health facilities
Add devlink health reporting on top of our fw watchdog.
Example: # devlink health show pci/0000:2b:00.0 reporter fw pci/0000:2b:00.0: reporter fw
pds_core: add devlink health facilities
Add devlink health reporting on top of our fw watchdog.
Example: # devlink health show pci/0000:2b:00.0 reporter fw pci/0000:2b:00.0: reporter fw state healthy error 0 recover 0 # devlink health diagnose pci/0000:2b:00.0 reporter fw Status: healthy State: 1 Generation: 0 Recoveries: 0
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c2dbb090 |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: health timer and workqueue
Add in the periodic health check and the related workqueue, as well as the handlers for when a FW reset is seen.
The firmware is polled every 5 seconds to be su
pds_core: health timer and workqueue
Add in the periodic health check and the related workqueue, as well as the handlers for when a FW reset is seen.
The firmware is polled every 5 seconds to be sure that it is still alive and that the FW generation didn't change.
The alive check looks to see that the PCI bus is still readable and the fw_status still has the RUNNING bit on. If not alive, the driver stops activity and tears things down. When the FW recovers and the alive check again succeeds, the driver sets back up for activity.
The generation check looks at the fw_generation to see if it has changed, which can happen if the FW crashed and recovered or was updated in between health checks. If changed, the driver counts that as though the alive test failed and forces the fw_down/fw_up cycle.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
523847df |
| 19-Apr-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: add devcmd device interfaces
The devcmd interface is the basic connection to the device through the PCI BAR for low level identification and command services. This does the early device i
pds_core: add devcmd device interfaces
The devcmd interface is the basic connection to the device through the PCI BAR for low level identification and command services. This does the early device initialization and finds the identity data, and adds devcmd routines to be used by later driver bits.
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|