1.. SPDX-License-Identifier: GPL-2.0 2 3=============== 4Boot Interrupts 5=============== 6 7:Author: - Sean V Kelley <sean.v.kelley@linux.intel.com> 8 9Overview 10======== 11 12On PCI Express, interrupts are represented with either MSI or inbound 13interrupt messages (Assert_INTx/Deassert_INTx). The integrated IO-APIC in a 14given Core IO converts the legacy interrupt messages from PCI Express to 15MSI interrupts. If the IO-APIC is disabled (via the mask bits in the 16IO-APIC table entries), the messages are routed to the legacy PCH. This 17in-band interrupt mechanism was traditionally necessary for systems that 18did not support the IO-APIC and for boot. Intel in the past has used the 19term "boot interrupts" to describe this mechanism. Further, the PCI Express 20protocol describes this in-band legacy wire-interrupt INTx mechanism for 21I/O devices to signal PCI-style level interrupts. The subsequent paragraphs 22describe problems with the Core IO handling of INTx message routing to the 23PCH and mitigation within BIOS and the OS. 24 25 26Issue 27===== 28 29When in-band legacy INTx messages are forwarded to the PCH, they in turn 30trigger a new interrupt for which the OS likely lacks a handler. When an 31interrupt goes unhandled over time, they are tracked by the Linux kernel as 32Spurious Interrupts. The IRQ will be disabled by the Linux kernel after it 33reaches a specific count with the error "nobody cared". This disabled IRQ 34now prevents valid usage by an existing interrupt which may happen to share 35the IRQ line. 36 37 irq 19: nobody cared (try booting with the "irqpoll" option) 38 CPU: 0 PID: 2988 Comm: irq/34-nipalk Tainted: 4.14.87-rt49-02410-g4a640ec-dirty #1 39 Hardware name: National Instruments NI PXIe-8880/NI PXIe-8880, BIOS 2.1.5f1 01/09/2020 40 Call Trace: 41 <IRQ> 42 ? dump_stack+0x46/0x5e 43 ? __report_bad_irq+0x2e/0xb0 44 ? note_interrupt+0x242/0x290 45 ? nNIKAL100_memoryRead16+0x8/0x10 [nikal] 46 ? handle_irq_event_percpu+0x55/0x70 47 ? handle_irq_event+0x4f/0x80 48 ? handle_fasteoi_irq+0x81/0x180 49 ? handle_irq+0x1c/0x30 50 ? do_IRQ+0x41/0xd0 51 ? common_interrupt+0x84/0x84 52 </IRQ> 53 54 handlers: 55 irq_default_primary_handler threaded usb_hcd_irq 56 Disabling IRQ #19 57 58 59Conditions 60========== 61 62The use of threaded interrupts is the most likely condition to trigger 63this problem today. Threaded interrupts may not be reenabled after the IRQ 64handler wakes. These "one shot" conditions mean that the threaded interrupt 65needs to keep the interrupt line masked until the threaded handler has run. 66Especially when dealing with high data rate interrupts, the thread needs to 67run to completion; otherwise some handlers will end up in stack overflows 68since the interrupt of the issuing device is still active. 69 70Affected Chipsets 71================= 72 73The legacy interrupt forwarding mechanism exists today in a number of 74devices including but not limited to chipsets from AMD/ATI, Broadcom, and 75Intel. Changes made through the mitigations below have been applied to 76drivers/pci/quirks.c 77 78Starting with ICX there are no longer any IO-APICs in the Core IO's 79devices. IO-APIC is only in the PCH. Devices connected to the Core IO's 80PCIe Root Ports will use native MSI/MSI-X mechanisms. 81 82Mitigations 83=========== 84 85The mitigations take the form of PCI quirks. The preference has been to 86first identify and make use of a means to disable the routing to the PCH. 87In such a case a quirk to disable boot interrupt generation can be 88added.[1] 89 90 Intel® 6300ESB I/O Controller Hub 91 Alternate Base Address Register: 92 BIE: Boot Interrupt Enable 93 0 = Boot interrupt is enabled. 94 1 = Boot interrupt is disabled. 95 96 Intel® Sandy Bridge through Sky Lake based Xeon servers: 97 Coherent Interface Protocol Interrupt Control 98 dis_intx_route2pch/dis_intx_route2ich/dis_intx_route2dmi2: 99 When this bit is set. Local INTx messages received from the 100 Intel® Quick Data DMA/PCI Express ports are not routed to legacy 101 PCH - they are either converted into MSI via the integrated IO-APIC 102 (if the IO-APIC mask bit is clear in the appropriate entries) 103 or cause no further action (when mask bit is set) 104 105In the absence of a way to directly disable the routing, another approach 106has been to make use of PCI Interrupt pin to INTx routing tables for 107purposes of redirecting the interrupt handler to the rerouted interrupt 108line by default. Therefore, on chipsets where this INTx routing cannot be 109disabled, the Linux kernel will reroute the valid interrupt to its legacy 110interrupt. This redirection of the handler will prevent the occurrence of 111the spurious interrupt detection which would ordinarily disable the IRQ 112line due to excessive unhandled counts.[2] 113 114The config option X86_REROUTE_FOR_BROKEN_BOOT_IRQS exists to enable (or 115disable) the redirection of the interrupt handler to the PCH interrupt 116line. The option can be overridden by either pci=ioapicreroute or 117pci=noioapicreroute.[3] 118 119 120More Documentation 121================== 122 123There is an overview of the legacy interrupt handling in several datasheets 124(6300ESB and 6700PXH below). While largely the same, it provides insight 125into the evolution of its handling with chipsets. 126 127Example of disabling of the boot interrupt 128------------------------------------------ 129 130Intel® 6300ESB I/O Controller Hub (Document # 300641-004US) 131 5.7.3 Boot Interrupt 132 https://www.intel.com/content/dam/doc/datasheet/6300esb-io-controller-hub-datasheet.pdf 133 134Intel® Xeon® Processor E5-1600/2400/2600/4600 v3 Product Families 135Datasheet - Volume 2: Registers (Document # 330784-003) 136 6.6.41 cipintrc Coherent Interface Protocol Interrupt Control 137 https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf 138 139Example of handler rerouting 140---------------------------- 141 142Intel® 6700PXH 64-bit PCI Hub (Document # 302628) 143 2.15.2 PCI Express Legacy INTx Support and Boot Interrupt 144 https://www.intel.com/content/dam/doc/datasheet/6700pxh-64-bit-pci-hub-datasheet.pdf 145 146 147If you have any legacy PCI interrupt questions that aren't answered, email me. 148 149Cheers, 150 Sean V Kelley 151 sean.v.kelley@linux.intel.com 152 153[1] https://lore.kernel.org/r/12131949181903-git-send-email-sassmann@suse.de/ 154[2] https://lore.kernel.org/r/12131949182094-git-send-email-sassmann@suse.de/ 155[3] https://lore.kernel.org/r/487C8EA7.6020205@suse.de/ 156