Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
1a6efd4c |
| 12-Feb-2024 |
Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> |
PCI/AER: Block runtime suspend when handling errors
[ Upstream commit 002bf2fbc00e5c4b95fb167287e2ae7d1973281e ]
PM runtime can be done simultaneously with AER error handling. Avoid that by using
PCI/AER: Block runtime suspend when handling errors
[ Upstream commit 002bf2fbc00e5c4b95fb167287e2ae7d1973281e ]
PM runtime can be done simultaneously with AER error handling. Avoid that by using pm_runtime_get_sync() before and pm_runtime_put() after reset in pcie_do_recovery() for all recovering devices.
pm_runtime_get_sync() will increase dev->power.usage_count counter to prevent any possible future request to runtime suspend a device. It will also resume a device, if it was previously in D3hot state.
I tested with igc device by doing simultaneous aer_inject and rpm suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can reproduce:
igc 0000:02:00.0: not ready 65535ms after bus reset; giving up pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25) pcieport 0000:00:1c.2: AER: subordinate device reset failed pcieport 0000:00:1c.2: AER: device recovery failed igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
The problem disappears when this patch is applied.
Link: https://lore.kernel.org/r/20240212120135.146068-1-stanislaw.gruszka@linux.intel.com Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.25, v6.6.24, v6.6.23 |
|
#
1a6efd4c |
| 12-Feb-2024 |
Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> |
PCI/AER: Block runtime suspend when handling errors
[ Upstream commit 002bf2fbc00e5c4b95fb167287e2ae7d1973281e ]
PM runtime can be done simultaneously with AER error handling. Avoid that by using
PCI/AER: Block runtime suspend when handling errors
[ Upstream commit 002bf2fbc00e5c4b95fb167287e2ae7d1973281e ]
PM runtime can be done simultaneously with AER error handling. Avoid that by using pm_runtime_get_sync() before and pm_runtime_put() after reset in pcie_do_recovery() for all recovering devices.
pm_runtime_get_sync() will increase dev->power.usage_count counter to prevent any possible future request to runtime suspend a device. It will also resume a device, if it was previously in D3hot state.
I tested with igc device by doing simultaneous aer_inject and rpm suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control and can reproduce:
igc 0000:02:00.0: not ready 65535ms after bus reset; giving up pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25) pcieport 0000:00:1c.2: AER: subordinate device reset failed pcieport 0000:00:1c.2: AER: device recovery failed igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
The problem disappears when this patch is applied.
Link: https://lore.kernel.org/r/20240212120135.146068-1-stanislaw.gruszka@linux.intel.com Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
show more ...
|
Revision tags: v6.6.16, v6.6.15, v6.6.14, v6.6.13, v6.6.12, v6.6.11, v6.6.10, v6.6.9, v6.6.8, v6.6.7, v6.6.6, v6.6.5, v6.6.4, v6.6.3, v6.6.2, v6.5.11, v6.6.1, v6.5.10, v6.6, v6.5.9, v6.5.8, v6.5.7, v6.5.6, v6.5.5, v6.5.4, v6.5.3, v6.5.2, v6.1.51, v6.5.1, v6.1.50, v6.5, v6.1.49, v6.1.48, v6.1.46, v6.1.45, v6.1.44, v6.1.43, v6.1.42, v6.1.41, v6.1.40, v6.1.39, v6.1.38, v6.1.37, v6.1.36, v6.4, v6.1.35, v6.1.34, v6.1.33, v6.1.32, v6.1.31, v6.1.30, v6.1.29, v6.1.28, v6.1.27, v6.1.26, v6.3, v6.1.25, v6.1.24, v6.1.23, v6.1.22, v6.1.21, v6.1.20, v6.1.19, v6.1.18, v6.1.17, v6.1.16, v6.1.15, v6.1.14, v6.1.13, v6.2, v6.1.12, v6.1.11, v6.1.10, v6.1.9, v6.1.8, v6.1.7, v6.1.6, v6.1.5, v6.0.19, v6.0.18, v6.1.4, v6.1.3, v6.0.17, v6.1.2, v6.0.16, v6.1.1, v6.0.15, v6.0.14, v6.0.13, v6.1, v6.0.12, v6.0.11, v6.0.10, v5.15.80, v6.0.9, v5.15.79, v6.0.8, v5.15.78, v6.0.7, v5.15.77, v5.15.76, v6.0.6, v6.0.5, v5.15.75, v6.0.4, v6.0.3, v6.0.2, v5.15.74, v5.15.73, v6.0.1, v5.15.72, v6.0, v5.15.71, v5.15.70, v5.15.69, v5.15.68, v5.15.67, v5.15.66, v5.15.65, v5.15.64, v5.15.63, v5.15.62, v5.15.61, v5.15.60, v5.15.59, v5.19, v5.15.58, v5.15.57, v5.15.56, v5.15.55, v5.15.54, v5.15.53, v5.15.52, v5.15.51, v5.15.50, v5.15.49, v5.15.48, v5.15.47, v5.15.46, v5.15.45 |
|
#
5e69a33c |
| 01-Jun-2022 |
Christoph Hellwig <hch@lst.de> |
PCI/ERR: Recognize disconnected devices in report_error_detected()
When a device is already unplugged by pciehp by the time the AER handler is invoked, the PCIe device will already be in the pci_cha
PCI/ERR: Recognize disconnected devices in report_error_detected()
When a device is already unplugged by pciehp by the time the AER handler is invoked, the PCIe device will already be in the pci_channel_io_perm_failure state. In that case simply return PCI_ERS_RESULT_DISCONNECT instead of trying to do a state transition that will fail.
Also untangle the state transition failure from the lack of methods to improve the debugging output in case it happens again.
Link: https://lore.kernel.org/r/20220601074024.3481035-1-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
show more ...
|
Revision tags: v5.15.44, v5.15.43, v5.15.42, v5.18, v5.15.41, v5.15.40, v5.15.39, v5.15.38, v5.15.37, v5.15.36, v5.15.35, v5.15.34, v5.15.33, v5.15.32, v5.15.31, v5.17, v5.15.30, v5.15.29, v5.15.28, v5.15.27, v5.15.26, v5.15.25, v5.15.24, v5.15.23, v5.15.22, v5.15.21, v5.15.20, v5.15.19, v5.15.18, v5.15.17, v5.4.173, v5.15.16, v5.15.15, v5.16, v5.15.10, v5.15.9, v5.15.8, v5.15.7, v5.15.6, v5.15.5, v5.15.4, v5.15.3, v5.15.2 |
|
#
e0217c5b |
| 10-Nov-2021 |
Bjorn Helgaas <bhelgaas@google.com> |
Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"
This reverts commit 2a4d9408c9e8b6f6fc150c66f3fef755c9e20d4a.
Robert reported a NULL pointer dereference caused by the PCI core (local_p
Revert "PCI: Use to_pci_driver() instead of pci_dev->driver"
This reverts commit 2a4d9408c9e8b6f6fc150c66f3fef755c9e20d4a.
Robert reported a NULL pointer dereference caused by the PCI core (local_pci_probe()) calling the i2c_designware_pci driver's .runtime_resume() method before the .probe() method. i2c_dw_pci_resume() depends on initialization done by i2c_dw_pci_probe().
Prior to 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver"), pci_pm_runtime_resume() avoided calling the .runtime_resume() method because pci_dev->driver had not been set yet.
2a4d9408c9e8 and b5f9c644eb1b ("PCI: Remove struct pci_dev->driver"), removed pci_dev->driver, replacing it by device->driver, which *has* been set by this time, so pci_pm_runtime_resume() called the .runtime_resume() method when it previously had not.
Fixes: 2a4d9408c9e8 ("PCI: Use to_pci_driver() instead of pci_dev->driver") Link: https://lore.kernel.org/linux-i2c/CAP145pgdrdiMAT7=-iB1DMgA7t_bMqTcJL4N0=6u8kNY3EU0dw@mail.gmail.com/ Reported-by: Robert Święcki <robert@swiecki.net> Tested-by: Robert Święcki <robert@swiecki.net> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
Revision tags: v5.15.1, v5.15, v5.14.14, v5.14.13, v5.14.12 |
|
#
2a4d9408 |
| 12-Oct-2021 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
PCI: Use to_pci_driver() instead of pci_dev->driver
Struct pci_driver contains a struct device_driver, so for PCI devices, it's easy to convert a device_driver * to a pci_driver * with to_pci_driver
PCI: Use to_pci_driver() instead of pci_dev->driver
Struct pci_driver contains a struct device_driver, so for PCI devices, it's easy to convert a device_driver * to a pci_driver * with to_pci_driver(). The device_driver * is in struct device, so we don't need to also keep track of the pci_driver * in struct pci_dev.
Replace pci_dev->driver with to_pci_driver(). This is a step toward removing pci_dev->driver.
[bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/20211004125935.2300113-11-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
171d149c |
| 12-Oct-2021 |
Bjorn Helgaas <bhelgaas@google.com> |
PCI/ERR: Factor out common dev->driver expressions
Save the struct pci_driver pointer from pdev->driver instead of repeating it several times. No functional change.
Signed-off-by: Bjorn Helgaas <b
PCI/ERR: Factor out common dev->driver expressions
Save the struct pci_driver pointer from pdev->driver instead of repeating it several times. No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
Revision tags: v5.14.11, v5.14.10, v5.14.9, v5.14.8, v5.14.7, v5.14.6, v5.10.67, v5.10.66, v5.14.5, v5.14.4, v5.10.65, v5.14.3, v5.10.64, v5.14.2, v5.10.63, v5.14.1, v5.10.62, v5.14, v5.10.61, v5.10.60, v5.10.53, v5.10.52, v5.10.51, v5.10.50, v5.10.49, v5.13, v5.10.46, v5.10.43, v5.10.42, v5.10.41, v5.10.40, v5.10.39, v5.4.119, v5.10.36, v5.10.35, v5.10.34, v5.4.116, v5.10.33, v5.12, v5.10.32, v5.10.31, v5.10.30, v5.10.27, v5.10.26, v5.10.25, v5.10.24, v5.10.23, v5.10.22, v5.10.21, v5.10.20, v5.10.19, v5.4.101, v5.10.18, v5.10.17, v5.11, v5.10.16, v5.10.15, v5.10.14 |
|
#
387c72cd |
| 04-Jan-2021 |
Keith Busch <kbusch@kernel.org> |
PCI/ERR: Retain status from error notification
Overwriting the frozen detected status with the result of the link reset loses the NEED_RESET result that drivers are depending on for error handling t
PCI/ERR: Retain status from error notification
Overwriting the frozen detected status with the result of the link reset loses the NEED_RESET result that drivers are depending on for error handling to report the .slot_reset() callback. Retain this status so that subsequent error handling has the correct flow.
Link: https://lore.kernel.org/r/20210104230300.1277180-4-kbusch@kernel.org Reported-by: Hinko Kocevar <hinko.kocevar@ess.eu> Tested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sean V Kelley <sean.v.kelley@intel.com> Acked-by: Hedi Berriche <hedi.berriche@hpe.com>
show more ...
|
#
7d7cbeab |
| 04-Jan-2021 |
Keith Busch <kbusch@kernel.org> |
PCI/ERR: Clear status of the reporting device
Error handling operates on the first Downstream Port above the detected error, but the error may have been reported by a downstream device. Clear the AE
PCI/ERR: Clear status of the reporting device
Error handling operates on the first Downstream Port above the detected error, but the error may have been reported by a downstream device. Clear the AER status of the device that reported the error rather than the first Downstream Port.
Link: https://lore.kernel.org/r/20210104230300.1277180-2-kbusch@kernel.org Tested-by: Hedi Berriche <hedi.berriche@hpe.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Sean V Kelley <sean.v.kelley@intel.com> Acked-by: Hedi Berriche <hedi.berriche@hpe.com>
show more ...
|
Revision tags: v5.10 |
|
#
57908622 |
| 20-Nov-2020 |
Qiuxu Zhuo <qiuxu.zhuo@intel.com> |
PCI/ERR: Recover from RCiEP AER errors
Add support for handling AER errors detected by Root Complex Integrated Endpoints (RCiEPs). These errors are signaled to software natively via a Root Complex
PCI/ERR: Recover from RCiEP AER errors
Add support for handling AER errors detected by Root Complex Integrated Endpoints (RCiEPs). These errors are signaled to software natively via a Root Complex Event Collector (RCEC) or non-natively via ACPI APEI if the platform retains control of AER or uses a non-standard RCEC-like device.
When recovering from RCiEP errors, the Root Error Command and Status registers are in the AER Capability of an associated RCEC (if any), not in a Root Port. In the non-native case, the platform is responsible for those registers and we can't touch them.
[bhelgaas: commit log, etc] Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com> Link: https://lore.kernel.org/r/20201121001036.8560-13-sean.v.kelley@intel.com Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
a175102b |
| 02-Dec-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Recover from RCEC AER errors
A Root Complex Event Collector (RCEC) collects and signals AER errors that were detected by Root Complex Integrated Endpoints (RCiEPs), but it may also signal e
PCI/ERR: Recover from RCEC AER errors
A Root Complex Event Collector (RCEC) collects and signals AER errors that were detected by Root Complex Integrated Endpoints (RCiEPs), but it may also signal errors it detects itself. This is analogous to errors detected and signaled by a Root Port.
Update the AER service driver to claim RCECs in addition to Root Ports. Add support for handling RCEC-detected AER errors. This does not include handling RCiEP-detected errors that are signaled by the RCEC.
Note that we expect these errors only from the native AER and APEI paths, not from DPC or EDR.
[bhelgaas: split from combined RCEC/RCiEP patch, commit log] Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
aa344bc8 |
| 24-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Clear AER status only when we control AER
In some cases a bridge may not exist as the hardware controlling may be handled only by firmware and so is not visible to the OS. This scenario is
PCI/ERR: Clear AER status only when we control AER
In some cases a bridge may not exist as the hardware controlling may be handled only by firmware and so is not visible to the OS. This scenario is also possible in future use cases involving non-native use of RCECs by firmware. In this scenario, we expect the platform to retain control of the bridge and to clear error status itself.
Clear error status only when the OS has native control of AER.
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
05e9ae19 |
| 20-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
Consolidate subordinate bus checks with pci_walk_bus() into pci_walk_bridge() for walking below potentially AER affected bridges.
Link: https://
PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
Consolidate subordinate bus checks with pci_walk_bus() into pci_walk_bridge() for walking below potentially AER affected bridges.
Link: https://lore.kernel.org/r/20201121001036.8560-10-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
3d7d8fc7 |
| 20-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Avoid negated conditional for clarity
Reverse the sense of the Root Port/Downstream Port conditional for clarity. No functional change intended.
Link: https://lore.kernel.org/r/20201121001
PCI/ERR: Avoid negated conditional for clarity
Reverse the sense of the Root Port/Downstream Port conditional for clarity. No functional change intended.
Link: https://lore.kernel.org/r/20201121001036.8560-9-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
show more ...
|
#
0791721d |
| 20-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
pcie_do_recovery() may be called with "dev" being either a bridge (Root Port or Switch Downstream Port) or an Endpoint. The bulk of the funct
PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
pcie_do_recovery() may be called with "dev" being either a bridge (Root Port or Switch Downstream Port) or an Endpoint. The bulk of the function deals with the bridge, so if we start with an Endpoint, we reset "dev" to be the bridge leading to it.
For clarity, replace "dev" in the body of the function with "bridge". No functional change intended.
Link: https://lore.kernel.org/r/20201121001036.8560-8-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
show more ...
|
#
480ef7cb |
| 20-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Simplify by computing pci_pcie_type() once
Instead of calling pci_pcie_type(dev) twice, call it once and save the result. No functional change intended.
Link: https://lore.kernel.org/r/20
PCI/ERR: Simplify by computing pci_pcie_type() once
Instead of calling pci_pcie_type(dev) twice, call it once and save the result. No functional change intended.
Link: https://lore.kernel.org/r/20201121001036.8560-7-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
show more ...
|
#
5d69dcc9 |
| 20-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Simplify by using pci_upstream_bridge()
Use pci_upstream_bridge() in place of dev->bus->self. No functional change intended.
Link: https://lore.kernel.org/r/20201121001036.8560-6-sean.v.k
PCI/ERR: Simplify by using pci_upstream_bridge()
Use pci_upstream_bridge() in place of dev->bus->self. No functional change intended.
Link: https://lore.kernel.org/r/20201121001036.8560-6-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
show more ...
|
#
8f1bbfbc |
| 20-Nov-2020 |
Sean V Kelley <sean.v.kelley@intel.com> |
PCI/ERR: Rename reset_link() to reset_subordinates()
reset_link() appears to be misnamed. The point is to reset any devices below a given bridge, so rename it to reset_subordinates() to make it cle
PCI/ERR: Rename reset_link() to reset_subordinates()
reset_link() appears to be misnamed. The point is to reset any devices below a given bridge, so rename it to reset_subordinates() to make it clear that we are passing a bridge with the intent to reset the devices below it.
Link: https://lore.kernel.org/r/20201121001036.8560-5-sean.v.kelley@intel.com Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
show more ...
|
Revision tags: v5.8.17, v5.8.16, v5.8.15, v5.9, v5.8.14, v5.8.13, v5.8.12, v5.8.11, v5.8.10, v5.8.9, v5.8.8, v5.8.7, v5.8.6, v5.4.62, v5.8.5, v5.8.4, v5.4.61, v5.8.3, v5.4.60, v5.8.2, v5.4.59, v5.8.1, v5.4.58, v5.4.57, v5.4.56, v5.8, v5.7.12, v5.4.55, v5.7.11, v5.4.54, v5.7.10, v5.4.53, v5.4.52, v5.7.9, v5.7.8, v5.4.51, v5.4.50, v5.7.7, v5.4.49, v5.7.6 |
|
#
068c29a2 |
| 22-Jun-2020 |
Jonathan Cameron <Jonathan.Cameron@huawei.com> |
PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
pcie_clear_device_status() resets the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA).
Previously we did this uncondition
PCI/ERR: Clear PCIe Device Status errors only if OS owns AER
pcie_clear_device_status() resets the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA).
Previously we did this unconditionally, but on ACPI systems, the _OSC AER bit negotiates control of the AER capability. Per sec 4.5.1 of the System Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers other error enable/status bits including the following:
Correctable Error Reporting Enable Non-Fatal Error Reporting Enable Fatal Error Reporting Enable Unsupported Request Reporting Enable
These bits are all in the PCIe Device Control register (the ECN omitted "Reporting", but I think that's a typo), so by implication the _OSC AER bit also applies to the error status bits in the PCIe Device Status register:
Correctable Error Detected Non-Fatal Error Detected Fatal Error Detected Unsupported Request Detected
Clear the PCIe Device Status error bits only when the OS controls the AER capability and related error enable/status bits. If platform firmware controls the AER capability, firmware is responsible for clearing these bits.
One call path leading here is:
ghes_do_proc ghes_handle_aer aer_recover_queue schedule_work(&aer_recover_work) ... aer_recover_work_func pcie_do_recovery pcie_clear_device_status
[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 [bhelgaas: commit log, move test from pcie_clear_device_status() to callers] Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
600a5b4f |
| 16-Jul-2020 |
Bjorn Helgaas <bhelgaas@google.com> |
PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()
pci_aer_clear_device_status() clears the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA). Every PCIe devic
PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()
pci_aer_clear_device_status() clears the error bits in the PCIe Device Status Register (PCI_EXP_DEVSTA). Every PCIe device has this register, regardless of whether it supports AER.
Rename pci_aer_clear_device_status() to pcie_clear_device_status() to make clear that it is PCIe-specific but not AER-specific. Move it to drivers/pci/pci.c, again since it's not AER-specific. No functional change intended.
Link: https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
16d79cd4 |
| 02-Jul-2020 |
Luc Van Oostenryck <luc.vanoostenryck@gmail.com> |
PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'
The method struct pci_error_handlers.error_detected() is defined and documented as taking an 'enum pci_channel_state' for the secon
PCI: Use 'pci_channel_state_t' instead of 'enum pci_channel_state'
The method struct pci_error_handlers.error_detected() is defined and documented as taking an 'enum pci_channel_state' for the second argument, but most drivers use 'pci_channel_state_t' instead.
This 'pci_channel_state_t' is not a typedef for the enum but a typedef for a bitwise type in order to have better/stricter typechecking.
Consolidate everything by using 'pci_channel_state_t' in the method's definition, in the related helpers and in the drivers.
Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state' with an anonymous 'enum'.
Note: Currently, from a typechecking point of view this patch changes nothing because only the constants defined by the enum are bitwise, not the enum itself (sparse doesn't have the notion of 'bitwise enum'). This may change in some not too far future, hence the patch.
[bhelgaas: squash in https://lore.kernel.org/r/20200702162651.49526-3-luc.vanoostenryck@gmail.com https://lore.kernel.org/r/20200702162651.49526-4-luc.vanoostenryck@gmail.com] Link: https://lore.kernel.org/r/20200702162651.49526-2-luc.vanoostenryck@gmail.com Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
Revision tags: v5.7.5, v5.4.48, v5.7.4, v5.7.3, v5.4.47, v5.4.46, v5.7.2, v5.4.45, v5.7.1, v5.4.44, v5.7, v5.4.43, v5.4.42, v5.4.41, v5.4.40, v5.4.39, v5.4.38, v5.4.37, v5.4.36, v5.4.35, v5.4.34, v5.4.33, v5.4.32, v5.4.31, v5.4.30, v5.4.29, v5.6, v5.4.28 |
|
#
894020fd |
| 23-Mar-2020 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
PCI/AER: Rationalize error status register clearing
The AER interfaces to clear error status registers were a confusing mess:
- pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors
PCI/AER: Rationalize error status register clearing
The AER interfaces to clear error status registers were a confusing mess:
- pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors from the Uncorrectable Error Status register.
- pci_aer_clear_fatal_status() cleared fatal errors from the Uncorrectable Error Status register.
- pci_cleanup_aer_error_status_regs() cleared the Root Error Status register (for Root Ports), the Uncorrectable Error Status register, and the Correctable Error Status register.
Rename them to make them consistent:
From To ---------------------------------------- ------------------------------- pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status() pci_aer_clear_fatal_status() pci_aer_clear_fatal_status() pci_cleanup_aer_error_status_regs() pci_aer_clear_status()
Since pci_cleanup_aer_error_status_regs() (renamed to pci_aer_clear_status()) is only used within drivers/pci/, move the declaration from <linux/aer.h> to drivers/pci/pci.h.
[bhelgaas: commit log, add renames] Link: https://lore.kernel.org/r/d1310a75dc3d28f7e8da4e99c45fbd3e60fe238e.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
e8e5ff2a |
| 23-Mar-2020 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
PCI/ERR: Return status of pcie_do_recovery()
As per the DPC Enhancements ECN [1], sec 4.5.1, table 4-4, if the OS supports Error Disconnect Recover (EDR), it must invalidate the software state assoc
PCI/ERR: Return status of pcie_do_recovery()
As per the DPC Enhancements ECN [1], sec 4.5.1, table 4-4, if the OS supports Error Disconnect Recover (EDR), it must invalidate the software state associated with child devices of the port without attempting to access the child device hardware. In addition, if the OS supports DPC, it must attempt to recover the child devices if the port implements the DPC Capability. If the OS continues operation, the OS must inform the firmware of the status of the recovery operation via the _OST method.
Return the result of pcie_do_recovery() so we can report it to firmware via _OST.
[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/12888
Link: https://lore.kernel.org/r/eb60ec89448769349c6722954ffbf2de163155b5.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
b6cf1a42 |
| 23-Mar-2020 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
PCI/ERR: Remove service dependency in pcie_do_recovery()
Previously we passed the PCIe service type parameter to pcie_do_recovery(), where reset_link() looked up the underlying pci_port_service_driv
PCI/ERR: Remove service dependency in pcie_do_recovery()
Previously we passed the PCIe service type parameter to pcie_do_recovery(), where reset_link() looked up the underlying pci_port_service_driver and its .reset_link() function pointer. Instead of using this roundabout way, we can just pass the driver-specific .reset_link() callback function when calling pcie_do_recovery() function.
This allows us to call pcie_do_recovery() from code that is not a PCIe port service driver, e.g., Error Disconnect Recover (EDR) support.
Remove pcie_port_find_service() and pcie_port_service_driver.reset_link since they are now unused.
Link: https://lore.kernel.org/r/60e02b87b526cdf2930400059d98704bf0a147d1.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|
#
6d2c8944 |
| 23-Mar-2020 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
PCI/ERR: Update error status after reset_link()
Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses reset_link() to recover from fatal errors. But during fatal error recovery, if the
PCI/ERR: Update error status after reset_link()
Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses reset_link() to recover from fatal errors. But during fatal error recovery, if the initial value of error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then even after successful recovery (using reset_link()) pcie_do_recovery() will report the recovery result as failure. Update the status of error after reset_link().
You can reproduce this issue by triggering a SW DPC using "DPC Software Trigger" bit in "DPC Control Register". You should see recovery failed dmesg log as below:
pcieport 0000:00:16.0: DPC: containment event, status:0x1f27 source:0x0000 pcieport 0000:00:16.0: DPC: software trigger detected pci 0000:04:00.0: AER: can't recover (no error_detected callback) pcieport 0000:00:16.0: AER: device recovery failed
Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") Link: https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com [bhelgaas: split pci_channel_io_frozen simplification to separate patch] Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Keith Busch <keith.busch@intel.com> Cc: Ashok Raj <ashok.raj@intel.com>
show more ...
|
#
b5dfbeac |
| 27-Mar-2020 |
Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> |
PCI/ERR: Combine pci_channel_io_frozen cases
pcie_do_recovery() had two "if (state == pci_channel_io_frozen)" cases right after each other. Combine them to make this easier to read. No functional
PCI/ERR: Combine pci_channel_io_frozen cases
pcie_do_recovery() had two "if (state == pci_channel_io_frozen)" cases right after each other. Combine them to make this easier to read. No functional change intended.
Link: https://lore.kernel.org/r/20200317170654.GA23125@infradead.org [bhelgaas: split from https://lore.kernel.org/r/a255fcb3a3fdebcd90f84e08b555f1786eb8eba2.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com] Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
show more ...
|