Revision tags: v6.6.67 |
|
#
278002ed |
| 15-Dec-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.66' into for/openbmc/dev-6.6
This is the 6.6.66 stable release
|
Revision tags: v6.6.66, v6.6.65, v6.6.64, v6.6.63, v6.6.62, v6.6.61, v6.6.60, v6.6.59 |
|
#
407476eb |
| 25-Oct-2024 |
Keith Busch <kbusch@kernel.org> |
PCI: Add 'reset_subordinate' to reset hierarchy below bridge
[ Upstream commit 2fa046449a82a7d0f6d9721dd83e348816038444 ]
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary
PCI: Add 'reset_subordinate' to reset hierarchy below bridge
[ Upstream commit 2fa046449a82a7d0f6d9721dd83e348816038444 ]
The "bus" and "cxl_bus" reset methods reset a device by asserting Secondary Bus Reset on the bridge leading to the device. These only work if the device is the only device below the bridge.
Add a sysfs 'reset_subordinate' attribute on bridges that can assert Secondary Bus Reset regardless of how many devices are below the bridge.
This resets all the devices below a bridge in a single command, including the locking and config space save/restore that reset methods normally do.
This may be the only way to reset devices that don't support other reset methods (ACPI, FLR, PM reset, etc).
Link: https://lore.kernel.org/r/20241025222755.3756162-1-kbusch@meta.com Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: commit log, add capable(CAP_SYS_ADMIN) check] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
ecc23d0a |
| 09-Dec-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.64' into for/openbmc/dev-6.6
This is the 6.6.64 stable release
|
Revision tags: v6.6.58, v6.6.57, v6.6.56, v6.6.55, v6.6.54 |
|
#
8e098baf |
| 01-Oct-2024 |
Todd Kjos <tkjos@google.com> |
PCI: Fix reset_method_store() memory leak
[ Upstream commit 2985b1844f3f3447f2d938eff1ef6762592065a5 ]
In reset_method_store(), a string is allocated via kstrndup() and assigned to the local "optio
PCI: Fix reset_method_store() memory leak
[ Upstream commit 2985b1844f3f3447f2d938eff1ef6762592065a5 ]
In reset_method_store(), a string is allocated via kstrndup() and assigned to the local "options". options is then used in with strsep() to find spaces:
while ((name = strsep(&options, " ")) != NULL) {
If there are no remaining spaces, then options is set to NULL by strsep(), so the subsequent kfree(options) doesn't free the memory allocated via kstrndup().
Fix by using a separate tmp_options to iterate with strsep() so options is preserved.
Link: https://lore.kernel.org/r/20241001231147.3583649-1-tkjos@google.com Fixes: d88f521da3ef ("PCI: Allow userspace to query and set device reset mechanism") Signed-off-by: Todd Kjos <tkjos@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
34d6f206 |
| 07-Oct-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.54' into for/openbmc/dev-6.6
This is the 6.6.54 stable release
|
Revision tags: v6.6.53, v6.6.52, v6.6.51, v6.6.50, v6.6.49, v6.6.48, v6.6.47, v6.6.46, v6.6.45 |
|
#
3d8573ab |
| 09-Aug-2024 |
Maciej W. Rozycki <macro@orcam.me.uk> |
PCI: Use an error code with PCIe failed link retraining
commit 59100eb248c0b15585affa546c7f6834b30eb5a4 upstream.
Given how the call place in pcie_wait_for_link_delay() got structured now, and that
PCI: Use an error code with PCIe failed link retraining
commit 59100eb248c0b15585affa546c7f6834b30eb5a4 upstream.
Given how the call place in pcie_wait_for_link_delay() got structured now, and that pcie_retrain_link() returns a potentially useful error code, convert pcie_failed_link_retrain() to return an error code rather than a boolean status, fixing handling at the call site mentioned. Update the other call site accordingly.
Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'") Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091156530.61955@angie.orcam.me.uk Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/ Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.5+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
894f2111 |
| 09-Aug-2024 |
Maciej W. Rozycki <macro@orcam.me.uk> |
PCI: Clear the LBMS bit after a link retrain
commit 8037ac08c2bbb3186f83a5a924f52d1048dbaec5 upstream.
The LBMS bit, where implemented, is set by hardware either in response to the completion of re
PCI: Clear the LBMS bit after a link retrain
commit 8037ac08c2bbb3186f83a5a924f52d1048dbaec5 upstream.
The LBMS bit, where implemented, is set by hardware either in response to the completion of retraining caused by writing 1 to the Retrain Link bit or whenever hardware has changed the link speed or width in attempt to correct unreliable link operation. It is never cleared by hardware other than by software writing 1 to the bit position in the Link Status register and we never do such a write.
We currently have two places, namely apply_bad_link_workaround() and pcie_failed_link_retrain() in drivers/pci/controller/dwc/pcie-tegra194.c and drivers/pci/quirks.c respectively where we check the state of the LBMS bit and neither is interested in the state of the bit resulting from the completion of retraining, both check for a link fault.
And in particular pcie_failed_link_retrain() causes issues consequently, by trying to retrain a link where there's no downstream device anymore and the state of 1 in the LBMS bit has been retained from when there was a device downstream that has since been removed.
Clear the LBMS bit then at the conclusion of pcie_retrain_link(), so that we have a single place that controls it and that our code can track link speed or width changes resulting from unreliable link operation.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures") Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091133140.61955@angie.orcam.me.uk Reported-by: Matthew W Carlis <mattc@purestorage.com> Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/ Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/ Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: <stable@vger.kernel.org> # v6.5+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
390de4d0 |
| 08-Aug-2024 |
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> |
PCI: Wait for Link before restoring Downstream Buses
[ Upstream commit 3e40aa29d47e231a54640addf6a09c1f64c5b63f ]
__pci_reset_bus() calls pci_bridge_secondary_bus_reset() to perform the reset and a
PCI: Wait for Link before restoring Downstream Buses
[ Upstream commit 3e40aa29d47e231a54640addf6a09c1f64c5b63f ]
__pci_reset_bus() calls pci_bridge_secondary_bus_reset() to perform the reset and also waits for the Secondary Bus to become again accessible. __pci_reset_bus() then calls pci_bus_restore_locked() that restores the PCI devices connected to the bus, and if necessary, recursively restores also the subordinate buses and their devices.
The logic in pci_bus_restore_locked() does not take into account that after restoring a device on one level, there might be another Link Downstream that can only start to come up after restore has been performed for its Downstream Port device. That is, the Link may require additional wait until it becomes accessible.
Similarly, pci_slot_restore_locked() lacks wait.
Amend pci_bus_restore_locked() and pci_slot_restore_locked() to wait for the Secondary Bus before recursively performing the restore of that bus.
Fixes: 090a3c5322e9 ("PCI: Add pci_reset_slot() and pci_reset_bus()") Link: https://lore.kernel.org/r/20240808121708.2523-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
ca2478a7 |
| 12-Sep-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.51' into for/openbmc/dev-6.6
This is the 6.6.51 stable release
|
Revision tags: v6.6.44, v6.6.43, v6.6.42, v6.6.41, v6.6.40, v6.6.39, v6.6.38, v6.6.37, v6.6.36, v6.6.35, v6.6.34, v6.6.33 |
|
#
78c6e39f |
| 30-May-2024 |
Dan Williams <dan.j.williams@intel.com> |
PCI: Add missing bridge lock to pci_bus_lock()
[ Upstream commit a4e772898f8bf2e7e1cf661a12c60a5612c4afab ]
One of the true positives that the cfg_access_lock lockdep effort identified is this sequ
PCI: Add missing bridge lock to pci_bus_lock()
[ Upstream commit a4e772898f8bf2e7e1cf661a12c60a5612c4afab ]
One of the true positives that the cfg_access_lock lockdep effort identified is this sequence:
WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70 RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70 Call Trace: <TASK> ? __warn+0x8c/0x190 ? pci_bridge_secondary_bus_reset+0x5d/0x70 ? report_bug+0x1f8/0x200 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? pci_bridge_secondary_bus_reset+0x5d/0x70 pci_reset_bus+0x1d8/0x270 vmd_probe+0x778/0xa10 pci_device_probe+0x95/0x120
Where pci_reset_bus() users are triggering unlocked secondary bus resets. Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses pci_bus_lock() before issuing the reset which locks everything *but* the bridge itself.
For the same motivation as adding:
bridge = pci_upstream_bridge(dev); if (bridge) pci_dev_lock(bridge);
to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add pci_dev_lock() for @bus->self to pci_bus_lock().
Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com Reported-by: Imre Deak <imre.deak@intel.com> Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Keith Busch <kbusch@kernel.org> [bhelgaas: squash in recursive locking deadlock fix from Keith Busch: https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Kalle Valo <kvalo@kernel.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
7e24a55b |
| 04-Aug-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.44' into for/openbmc/dev-6.6
This is the 6.6.44 stable release
|
#
2cc8973b |
| 18-Jun-2024 |
Lukas Wunner <lukas@wunner.de> |
PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
commit 11a1f4bc47362700fcbde717292158873fb847ed upstream.
Keith reports a use-after-free when a DPC event occurs concurrently to hot-re
PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
commit 11a1f4bc47362700fcbde717292158873fb847ed upstream.
Keith reports a use-after-free when a DPC event occurs concurrently to hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the Downstream Port where the DPC event occurred. To do so, it polls the config space of the first child device on the secondary bus. If that child device is concurrently removed, accesses to its struct pci_dev cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a reference on the child device. Before v6.3, the function was only called on resume from system sleep or on runtime resume. Holding a reference wasn't necessary back then because the pciehp IRQ thread could never run concurrently. (On resume from system sleep, IRQs are not enabled until after the resume_noirq phase. And runtime resume is always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset"), which introduced that, failed to appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a reference on the child device because dpc_handler() and pciehp may indeed run concurrently. The commit was backported to v5.10+ stable kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0 CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0 RIP: pci_bus_read_config_dword+0x17/0x50 pci_dev_wait() pci_bridge_wait_for_secondary_bus() dpc_reset_link() pcie_do_recovery() dpc_handler()
Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset") Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/ Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.de Reported-by: Keith Busch <kbusch@kernel.org> Tested-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
57904291 |
| 27-Jun-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.36' into dev-6.6
This is the 6.6.36 stable release
|
Revision tags: v6.6.32, v6.6.31, v6.6.30, v6.6.29, v6.6.28, v6.6.27, v6.6.26, v6.6.25, v6.6.24, v6.6.23 |
|
#
fae0e055 |
| 08-Feb-2024 |
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> |
PCI: Do not wait for disconnected devices when resuming
[ Upstream commit 6613443ffc49d03e27f0404978f685c4eac43fba ]
On runtime resume, pci_dev_wait() is called:
pci_pm_runtime_resume() pci_
PCI: Do not wait for disconnected devices when resuming
[ Upstream commit 6613443ffc49d03e27f0404978f685c4eac43fba ]
On runtime resume, pci_dev_wait() is called:
pci_pm_runtime_resume() pci_pm_bridge_power_up_actions() pci_bridge_wait_for_secondary_bus() pci_dev_wait()
While a device is runtime suspended along with its PCI hierarchy, the device could get disconnected. In such case, the link will not come up no matter how long pci_dev_wait() waits for it.
Besides the above mentioned case, there could be other ways to get the device disconnected while pci_dev_wait() is waiting for the link to come up.
Make pci_dev_wait() exit if the device is already disconnected to avoid unnecessary delay.
The use cases of pci_dev_wait() boil down to two:
1. Waiting for the device after reset 2. pci_bridge_wait_for_secondary_bus()
The callers in both cases seem to benefit from propagating the disconnection as error even if device disconnection would be more analoguous to the case where there is no device in the first place which return 0 from pci_dev_wait(). In the case 2, it results in unnecessary marking of the devices disconnected again but that is just harmless extra work.
Also make sure compiler does not become too clever with dev->error_state and use READ_ONCE() to force a fetch for the up-to-date value.
Link: https://lore.kernel.org/r/20240208132322.4811-1-ilpo.jarvinen@linux.intel.com Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
71962891 |
| 07-Mar-2024 |
Mario Limonciello <mario.limonciello@amd.com> |
PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports
[ Upstream commit 256df20c590bf0e4d63ac69330cf23faddac3e08 ]
Hewlett-Packard HP Pavilion 17 Notebook PC/1972 is an Intel Ivy Bridge system
PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports
[ Upstream commit 256df20c590bf0e4d63ac69330cf23faddac3e08 ]
Hewlett-Packard HP Pavilion 17 Notebook PC/1972 is an Intel Ivy Bridge system with a muxless AMD Radeon dGPU. Attempting to use the dGPU fails with the following sequence:
ACPI Error: Aborting method \AMD3._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529) radeon 0000:01:00.0: not ready 1023ms after resume; waiting radeon 0000:01:00.0: not ready 2047ms after resume; waiting radeon 0000:01:00.0: not ready 4095ms after resume; waiting radeon 0000:01:00.0: not ready 8191ms after resume; waiting radeon 0000:01:00.0: not ready 16383ms after resume; waiting radeon 0000:01:00.0: not ready 32767ms after resume; waiting radeon 0000:01:00.0: not ready 65535ms after resume; giving up radeon 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
The issue is that the Root Port the dGPU is connected to can't handle the transition from D3cold to D0 so the dGPU can't properly exit runtime PM.
The existing logic in pci_bridge_d3_possible() checks for systems that are newer than 2015 to decide that D3 is safe. This would nominally work for an Ivy Bridge system (which was discontinued in 2015), but this system appears to have continued to receive BIOS updates until 2017 and so this existing logic doesn't appropriately capture it.
Add the system to bridge_d3_blacklist to prevent D3cold from being used.
Link: https://lore.kernel.org/r/20240307163709.323-1-mario.limonciello@amd.com Reported-by: Eric Heintzmann <heintzmann.eric@free.fr> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3229 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Eric Heintzmann <heintzmann.eric@free.fr> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
b181f702 |
| 12-Jun-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.33' into dev-6.6
This is the 6.6.33 stable release
|
#
0053891e |
| 23-Apr-2024 |
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> |
PCI: Wait for Link Training==0 before starting Link retrain
[ Upstream commit 73cb3a35f94db723c0211ad099bce55b2155e3f0 ]
Two changes were made in link retraining logic independent of each other.
T
PCI: Wait for Link Training==0 before starting Link retrain
[ Upstream commit 73cb3a35f94db723c0211ad099bce55b2155e3f0 ]
Two changes were made in link retraining logic independent of each other.
The commit e7e39756363a ("PCI/ASPM: Avoid link retraining race") added a check to pcie_retrain_link() to ensure no Link Training is currently active to address the Implementation Note in PCIe r6.1 sec 7.5.3.7. At that time pcie_wait_for_retrain() only checked for the Link Training (LT) bit being cleared.
The commit 680e9c47a229 ("PCI: Add support for polling DLLLA to pcie_retrain_link()") generalized pcie_wait_for_retrain() into pcie_wait_for_link_status() which can wait either for LT or the Data Link Layer Link Active (DLLLA) bit with 'use_lt' argument and supporting waiting for either cleared or set using 'active' argument.
In the merge commit 1abb47390350 ("Merge branch 'pci/enumeration'"), those two divergent branches converged. The merge changed LT bit checking added in the commit e7e39756363a ("PCI/ASPM: Avoid link retraining race") to now wait for completion of any ongoing Link Training using DLLLA bit being set if 'use_lt' is false.
When 'use_lt' is false, the pseudo-code steps of what occurs in pcie_retrain_link():
1. Wait for DLLLA==1 2. Trigger link to retrain 3. Wait for DLLLA==1
Step 3 waits for the link to come up from the retraining triggered by Step 2. As Step 1 is supposed to wait for any ongoing retraining to end, using DLLLA also for it does not make sense because link training being active is still indicated using LT bit, not with DLLLA.
Correct the pcie_wait_for_link_status() parameters in Step 1 to only wait for LT==0 to ensure there is no ongoing Link Training.
This only impacts the Target Speed quirk, which is the only case where waiting for DLLLA bit is used. It currently works in the problematic case by means of link training getting initiated by hardware repeatedly and respecting the new link parameters set by the caller, which then make training succeed and bring the link up, setting DLLLA and causing pcie_wait_for_link_status() to return success. We are not supposed to rely on luck and need to make sure that LT transitioned through the inactive state though before we initiate link training by hand via RL (Retrain Link) bit.
Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'") Link: https://lore.kernel.org/r/20240423130820.43824-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
c845428b |
| 28-Apr-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.29' into dev-6.6
This is the 6.6.29 stable release
|
Revision tags: v6.6.16, v6.6.15 |
|
#
b0f44788 |
| 30-Jan-2024 |
Johan Hovold <johan+linaro@kernel.org> |
PCI/ASPM: Fix deadlock when enabling ASPM
commit 1e560864159d002b453da42bd2c13a1805515a20 upstream.
A last minute revert in 6.7-final introduced a potential deadlock when enabling ASPM during probe
PCI/ASPM: Fix deadlock when enabling ASPM
commit 1e560864159d002b453da42bd2c13a1805515a20 upstream.
A last minute revert in 6.7-final introduced a potential deadlock when enabling ASPM during probe of Qualcomm PCIe controllers as reported by lockdep:
============================================ WARNING: possible recursive locking detected 6.7.0 #40 Not tainted -------------------------------------------- kworker/u16:5/90 is trying to acquire lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
but task is already holding lock: ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(pci_bus_sem); lock(pci_bus_sem);
*** DEADLOCK ***
Call trace: print_deadlock_bug+0x25c/0x348 __lock_acquire+0x10a4/0x2064 lock_acquire+0x1e8/0x318 down_read+0x60/0x184 pcie_aspm_pm_state_change+0x58/0xdc pci_set_full_power_state+0xa8/0x114 pci_set_power_state+0xc4/0x120 qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom] pci_walk_bus+0x64/0xbc qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
The deadlock can easily be reproduced on machines like the Lenovo ThinkPad X13s by adding a delay to increase the race window during asynchronous probe where another thread can take a write lock.
Add a new pci_set_power_state_locked() and associated helper functions that can be called with the PCI bus semaphore held to avoid taking the read lock twice.
Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> # 6.7 [bhelgaas: backported to v6.6.y, which contains 8cc22ba3f77c ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()""), a backport of f93e71aea6c6. This omits the drivers/pci/controller/dwc/pcie-qcom.c hunk that updates qcom_pcie_enable_aspm(), which was added by 9f4f3dfad8cf ("PCI: qcom: Enable ASPM for platforms supporting 1.9.0 ops"), which is not present in v6.6.28.] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
c595db6d |
| 13-Mar-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.18' into dev-6.6
This is the 6.6.18 stable release
|
Revision tags: v6.6.14 |
|
#
63b1a3d9 |
| 23-Jan-2024 |
Alex Williamson <alex.williamson@redhat.com> |
PCI: Fix active state requirement in PME polling
[ Upstream commit 41044d5360685e78a869d40a168491a70cdb7e73 ]
The commit noted in fixes added a bogus requirement that runtime PM managed devices nee
PCI: Fix active state requirement in PME polling
[ Upstream commit 41044d5360685e78a869d40a168491a70cdb7e73 ]
The commit noted in fixes added a bogus requirement that runtime PM managed devices need to be in the RPM_ACTIVE state for PME polling. In fact, only devices in low power states should be polled.
However there's still a requirement that the device config space must be accessible, which has implications for both the current state of the polled device and the parent bridge, when present. It's not sufficient to assume the bridge remains in D0 and cases have been observed where the bridge passes the D0 test, but the PM state indicates RPM_SUSPENDING and config space of the polled device becomes inaccessible during pci_pme_wakeup().
Therefore, since the bridge is already effectively required to be in the RPM_ACTIVE state, formalize this in the code and elevate the PM usage count to maintain the state while polling the subordinate device.
This resolves a regression reported in the bugzilla below where a Thunderbolt/USB4 hierarchy fails to scan for an attached NVMe endpoint downstream of a bridge in a D3hot power state.
Link: https://lore.kernel.org/r/20240123185548.1040096-1-alex.williamson@redhat.com Fixes: d3fcd7360338 ("PCI: Fix runtime PM race with PME polling") Reported-by: Sanath S <sanath.s@amd.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218360 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Sanath S <sanath.s@amd.com> Reviewed-by: Rafael J. Wysocki <rafael@kernel.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
#
59f5a149 |
| 10-Feb-2024 |
Andrew Jeffery <andrew@codeconstruct.com.au> |
Merge tag 'v6.6.11' into dev-6.6
This is the 6.6.11 stable release
|
Revision tags: v6.6.13, v6.6.12, v6.6.11, v6.6.10 |
|
#
8cc22ba3 |
| 01-Jan-2024 |
Bjorn Helgaas <bhelgaas@google.com> |
Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
commit f93e71aea6c60ebff8adbd8941e678302d377869 upstream.
This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
Michael reported that
Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
commit f93e71aea6c60ebff8adbd8941e678302d377869 upstream.
This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
Michael reported that when attempting to resume from suspend to RAM on ASUS mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay with no output, followed by a reboot.
Workarounds include:
- Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") - Booting with "pcie_aspm=off" - Booting with "pcie_aspm.policy=performance" - "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm" before suspending - Connecting a USB flash drive
Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") Reported-by: Michael Schaller <michael@5challer.de> Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
Revision tags: v6.6.9, v6.6.8 |
|
#
b97d6790 |
| 13-Dec-2023 |
Joel Stanley <joel@jms.id.au> |
Merge tag 'v6.6.6' into dev-6.6
This is the 6.6.6 stable release
Signed-off-by: Joel Stanley <joel@jms.id.au>
|
Revision tags: v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8 |
|
#
58c91b69 |
| 10-Oct-2023 |
Bjorn Helgaas <bhelgaas@google.com> |
PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk
[ Upstream commit 04e82fa5951ca66495d7b05665eff673aa3852b4 ]
Use FIELD_GET() to remove dependences on the field position, i.e., the shift val
PCI: Use FIELD_GET() in Sapphire RX 5600 XT Pulse quirk
[ Upstream commit 04e82fa5951ca66495d7b05665eff673aa3852b4 ]
Use FIELD_GET() to remove dependences on the field position, i.e., the shift value. No functional change intended.
Separate because this isn't as trivial as the other FIELD_GET() changes.
See 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Link: https://lore.kernel.org/r/20231010204436.1000644-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|