1*3b9bae02SChangbin Du.. SPDX-License-Identifier: GPL-2.0 2*3b9bae02SChangbin Du.. include:: <isonum.txt> 3*3b9bae02SChangbin Du 4*3b9bae02SChangbin Du========================== 5*3b9bae02SChangbin DuThe MSI Driver Guide HOWTO 6*3b9bae02SChangbin Du========================== 7*3b9bae02SChangbin Du 8*3b9bae02SChangbin Du:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox 9*3b9bae02SChangbin Du 10*3b9bae02SChangbin Du:Copyright: 2003, 2008 Intel Corporation 11*3b9bae02SChangbin Du 12*3b9bae02SChangbin DuAbout this guide 13*3b9bae02SChangbin Du================ 14*3b9bae02SChangbin Du 15*3b9bae02SChangbin DuThis guide describes the basics of Message Signaled Interrupts (MSIs), 16*3b9bae02SChangbin Duthe advantages of using MSI over traditional interrupt mechanisms, how 17*3b9bae02SChangbin Duto change your driver to use MSI or MSI-X and some basic diagnostics to 18*3b9bae02SChangbin Dutry if a device doesn't support MSIs. 19*3b9bae02SChangbin Du 20*3b9bae02SChangbin Du 21*3b9bae02SChangbin DuWhat are MSIs? 22*3b9bae02SChangbin Du============== 23*3b9bae02SChangbin Du 24*3b9bae02SChangbin DuA Message Signaled Interrupt is a write from the device to a special 25*3b9bae02SChangbin Duaddress which causes an interrupt to be received by the CPU. 26*3b9bae02SChangbin Du 27*3b9bae02SChangbin DuThe MSI capability was first specified in PCI 2.2 and was later enhanced 28*3b9bae02SChangbin Duin PCI 3.0 to allow each interrupt to be masked individually. The MSI-X 29*3b9bae02SChangbin Ducapability was also introduced with PCI 3.0. It supports more interrupts 30*3b9bae02SChangbin Duper device than MSI and allows interrupts to be independently configured. 31*3b9bae02SChangbin Du 32*3b9bae02SChangbin DuDevices may support both MSI and MSI-X, but only one can be enabled at 33*3b9bae02SChangbin Dua time. 34*3b9bae02SChangbin Du 35*3b9bae02SChangbin Du 36*3b9bae02SChangbin DuWhy use MSIs? 37*3b9bae02SChangbin Du============= 38*3b9bae02SChangbin Du 39*3b9bae02SChangbin DuThere are three reasons why using MSIs can give an advantage over 40*3b9bae02SChangbin Dutraditional pin-based interrupts. 41*3b9bae02SChangbin Du 42*3b9bae02SChangbin DuPin-based PCI interrupts are often shared amongst several devices. 43*3b9bae02SChangbin DuTo support this, the kernel must call each interrupt handler associated 44*3b9bae02SChangbin Duwith an interrupt, which leads to reduced performance for the system as 45*3b9bae02SChangbin Dua whole. MSIs are never shared, so this problem cannot arise. 46*3b9bae02SChangbin Du 47*3b9bae02SChangbin DuWhen a device writes data to memory, then raises a pin-based interrupt, 48*3b9bae02SChangbin Duit is possible that the interrupt may arrive before all the data has 49*3b9bae02SChangbin Duarrived in memory (this becomes more likely with devices behind PCI-PCI 50*3b9bae02SChangbin Dubridges). In order to ensure that all the data has arrived in memory, 51*3b9bae02SChangbin Duthe interrupt handler must read a register on the device which raised 52*3b9bae02SChangbin Duthe interrupt. PCI transaction ordering rules require that all the data 53*3b9bae02SChangbin Duarrive in memory before the value may be returned from the register. 54*3b9bae02SChangbin DuUsing MSIs avoids this problem as the interrupt-generating write cannot 55*3b9bae02SChangbin Dupass the data writes, so by the time the interrupt is raised, the driver 56*3b9bae02SChangbin Duknows that all the data has arrived in memory. 57*3b9bae02SChangbin Du 58*3b9bae02SChangbin DuPCI devices can only support a single pin-based interrupt per function. 59*3b9bae02SChangbin DuOften drivers have to query the device to find out what event has 60*3b9bae02SChangbin Duoccurred, slowing down interrupt handling for the common case. With 61*3b9bae02SChangbin DuMSIs, a device can support more interrupts, allowing each interrupt 62*3b9bae02SChangbin Duto be specialised to a different purpose. One possible design gives 63*3b9bae02SChangbin Duinfrequent conditions (such as errors) their own interrupt which allows 64*3b9bae02SChangbin Duthe driver to handle the normal interrupt handling path more efficiently. 65*3b9bae02SChangbin DuOther possible designs include giving one interrupt to each packet queue 66*3b9bae02SChangbin Duin a network card or each port in a storage controller. 67*3b9bae02SChangbin Du 68*3b9bae02SChangbin Du 69*3b9bae02SChangbin DuHow to use MSIs 70*3b9bae02SChangbin Du=============== 71*3b9bae02SChangbin Du 72*3b9bae02SChangbin DuPCI devices are initialised to use pin-based interrupts. The device 73*3b9bae02SChangbin Dudriver has to set up the device to use MSI or MSI-X. Not all machines 74*3b9bae02SChangbin Dusupport MSIs correctly, and for those machines, the APIs described below 75*3b9bae02SChangbin Duwill simply fail and the device will continue to use pin-based interrupts. 76*3b9bae02SChangbin Du 77*3b9bae02SChangbin DuInclude kernel support for MSIs 78*3b9bae02SChangbin Du------------------------------- 79*3b9bae02SChangbin Du 80*3b9bae02SChangbin DuTo support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI 81*3b9bae02SChangbin Duoption enabled. This option is only available on some architectures, 82*3b9bae02SChangbin Duand it may depend on some other options also being set. For example, 83*3b9bae02SChangbin Duon x86, you must also enable X86_UP_APIC or SMP in order to see the 84*3b9bae02SChangbin DuCONFIG_PCI_MSI option. 85*3b9bae02SChangbin Du 86*3b9bae02SChangbin DuUsing MSI 87*3b9bae02SChangbin Du--------- 88*3b9bae02SChangbin Du 89*3b9bae02SChangbin DuMost of the hard work is done for the driver in the PCI layer. The driver 90*3b9bae02SChangbin Dusimply has to request that the PCI layer set up the MSI capability for this 91*3b9bae02SChangbin Dudevice. 92*3b9bae02SChangbin Du 93*3b9bae02SChangbin DuTo automatically use MSI or MSI-X interrupt vectors, use the following 94*3b9bae02SChangbin Dufunction:: 95*3b9bae02SChangbin Du 96*3b9bae02SChangbin Du int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, 97*3b9bae02SChangbin Du unsigned int max_vecs, unsigned int flags); 98*3b9bae02SChangbin Du 99*3b9bae02SChangbin Duwhich allocates up to max_vecs interrupt vectors for a PCI device. It 100*3b9bae02SChangbin Dureturns the number of vectors allocated or a negative error. If the device 101*3b9bae02SChangbin Duhas a requirements for a minimum number of vectors the driver can pass a 102*3b9bae02SChangbin Dumin_vecs argument set to this limit, and the PCI core will return -ENOSPC 103*3b9bae02SChangbin Duif it can't meet the minimum number of vectors. 104*3b9bae02SChangbin Du 105*3b9bae02SChangbin DuThe flags argument is used to specify which type of interrupt can be used 106*3b9bae02SChangbin Duby the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX). 107*3b9bae02SChangbin DuA convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for 108*3b9bae02SChangbin Duany possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set, 109*3b9bae02SChangbin Dupci_alloc_irq_vectors() will spread the interrupts around the available CPUs. 110*3b9bae02SChangbin Du 111*3b9bae02SChangbin DuTo get the Linux IRQ numbers passed to request_irq() and free_irq() and the 112*3b9bae02SChangbin Duvectors, use the following function:: 113*3b9bae02SChangbin Du 114*3b9bae02SChangbin Du int pci_irq_vector(struct pci_dev *dev, unsigned int nr); 115*3b9bae02SChangbin Du 116*3b9bae02SChangbin DuAny allocated resources should be freed before removing the device using 117*3b9bae02SChangbin Duthe following function:: 118*3b9bae02SChangbin Du 119*3b9bae02SChangbin Du void pci_free_irq_vectors(struct pci_dev *dev); 120*3b9bae02SChangbin Du 121*3b9bae02SChangbin DuIf a device supports both MSI-X and MSI capabilities, this API will use the 122*3b9bae02SChangbin DuMSI-X facilities in preference to the MSI facilities. MSI-X supports any 123*3b9bae02SChangbin Dunumber of interrupts between 1 and 2048. In contrast, MSI is restricted to 124*3b9bae02SChangbin Dua maximum of 32 interrupts (and must be a power of two). In addition, the 125*3b9bae02SChangbin DuMSI interrupt vectors must be allocated consecutively, so the system might 126*3b9bae02SChangbin Dunot be able to allocate as many vectors for MSI as it could for MSI-X. On 127*3b9bae02SChangbin Dusome platforms, MSI interrupts must all be targeted at the same set of CPUs 128*3b9bae02SChangbin Duwhereas MSI-X interrupts can all be targeted at different CPUs. 129*3b9bae02SChangbin Du 130*3b9bae02SChangbin DuIf a device supports neither MSI-X or MSI it will fall back to a single 131*3b9bae02SChangbin Dulegacy IRQ vector. 132*3b9bae02SChangbin Du 133*3b9bae02SChangbin DuThe typical usage of MSI or MSI-X interrupts is to allocate as many vectors 134*3b9bae02SChangbin Duas possible, likely up to the limit supported by the device. If nvec is 135*3b9bae02SChangbin Dularger than the number supported by the device it will automatically be 136*3b9bae02SChangbin Ducapped to the supported limit, so there is no need to query the number of 137*3b9bae02SChangbin Duvectors supported beforehand:: 138*3b9bae02SChangbin Du 139*3b9bae02SChangbin Du nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES) 140*3b9bae02SChangbin Du if (nvec < 0) 141*3b9bae02SChangbin Du goto out_err; 142*3b9bae02SChangbin Du 143*3b9bae02SChangbin DuIf a driver is unable or unwilling to deal with a variable number of MSI 144*3b9bae02SChangbin Duinterrupts it can request a particular number of interrupts by passing that 145*3b9bae02SChangbin Dunumber to pci_alloc_irq_vectors() function as both 'min_vecs' and 146*3b9bae02SChangbin Du'max_vecs' parameters:: 147*3b9bae02SChangbin Du 148*3b9bae02SChangbin Du ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES); 149*3b9bae02SChangbin Du if (ret < 0) 150*3b9bae02SChangbin Du goto out_err; 151*3b9bae02SChangbin Du 152*3b9bae02SChangbin DuThe most notorious example of the request type described above is enabling 153*3b9bae02SChangbin Duthe single MSI mode for a device. It could be done by passing two 1s as 154*3b9bae02SChangbin Du'min_vecs' and 'max_vecs':: 155*3b9bae02SChangbin Du 156*3b9bae02SChangbin Du ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); 157*3b9bae02SChangbin Du if (ret < 0) 158*3b9bae02SChangbin Du goto out_err; 159*3b9bae02SChangbin Du 160*3b9bae02SChangbin DuSome devices might not support using legacy line interrupts, in which case 161*3b9bae02SChangbin Duthe driver can specify that only MSI or MSI-X is acceptable:: 162*3b9bae02SChangbin Du 163*3b9bae02SChangbin Du nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX); 164*3b9bae02SChangbin Du if (nvec < 0) 165*3b9bae02SChangbin Du goto out_err; 166*3b9bae02SChangbin Du 167*3b9bae02SChangbin DuLegacy APIs 168*3b9bae02SChangbin Du----------- 169*3b9bae02SChangbin Du 170*3b9bae02SChangbin DuThe following old APIs to enable and disable MSI or MSI-X interrupts should 171*3b9bae02SChangbin Dunot be used in new code:: 172*3b9bae02SChangbin Du 173*3b9bae02SChangbin Du pci_enable_msi() /* deprecated */ 174*3b9bae02SChangbin Du pci_disable_msi() /* deprecated */ 175*3b9bae02SChangbin Du pci_enable_msix_range() /* deprecated */ 176*3b9bae02SChangbin Du pci_enable_msix_exact() /* deprecated */ 177*3b9bae02SChangbin Du pci_disable_msix() /* deprecated */ 178*3b9bae02SChangbin Du 179*3b9bae02SChangbin DuAdditionally there are APIs to provide the number of supported MSI or MSI-X 180*3b9bae02SChangbin Duvectors: pci_msi_vec_count() and pci_msix_vec_count(). In general these 181*3b9bae02SChangbin Dushould be avoided in favor of letting pci_alloc_irq_vectors() cap the 182*3b9bae02SChangbin Dunumber of vectors. If you have a legitimate special use case for the count 183*3b9bae02SChangbin Duof vectors we might have to revisit that decision and add a 184*3b9bae02SChangbin Dupci_nr_irq_vectors() helper that handles MSI and MSI-X transparently. 185*3b9bae02SChangbin Du 186*3b9bae02SChangbin DuConsiderations when using MSIs 187*3b9bae02SChangbin Du------------------------------ 188*3b9bae02SChangbin Du 189*3b9bae02SChangbin DuSpinlocks 190*3b9bae02SChangbin Du~~~~~~~~~ 191*3b9bae02SChangbin Du 192*3b9bae02SChangbin DuMost device drivers have a per-device spinlock which is taken in the 193*3b9bae02SChangbin Duinterrupt handler. With pin-based interrupts or a single MSI, it is not 194*3b9bae02SChangbin Dunecessary to disable interrupts (Linux guarantees the same interrupt will 195*3b9bae02SChangbin Dunot be re-entered). If a device uses multiple interrupts, the driver 196*3b9bae02SChangbin Dumust disable interrupts while the lock is held. If the device sends 197*3b9bae02SChangbin Dua different interrupt, the driver will deadlock trying to recursively 198*3b9bae02SChangbin Duacquire the spinlock. Such deadlocks can be avoided by using 199*3b9bae02SChangbin Duspin_lock_irqsave() or spin_lock_irq() which disable local interrupts 200*3b9bae02SChangbin Duand acquire the lock (see Documentation/kernel-hacking/locking.rst). 201*3b9bae02SChangbin Du 202*3b9bae02SChangbin DuHow to tell whether MSI/MSI-X is enabled on a device 203*3b9bae02SChangbin Du---------------------------------------------------- 204*3b9bae02SChangbin Du 205*3b9bae02SChangbin DuUsing 'lspci -v' (as root) may show some devices with "MSI", "Message 206*3b9bae02SChangbin DuSignalled Interrupts" or "MSI-X" capabilities. Each of these capabilities 207*3b9bae02SChangbin Duhas an 'Enable' flag which is followed with either "+" (enabled) 208*3b9bae02SChangbin Duor "-" (disabled). 209*3b9bae02SChangbin Du 210*3b9bae02SChangbin Du 211*3b9bae02SChangbin DuMSI quirks 212*3b9bae02SChangbin Du========== 213*3b9bae02SChangbin Du 214*3b9bae02SChangbin DuSeveral PCI chipsets or devices are known not to support MSIs. 215*3b9bae02SChangbin DuThe PCI stack provides three ways to disable MSIs: 216*3b9bae02SChangbin Du 217*3b9bae02SChangbin Du1. globally 218*3b9bae02SChangbin Du2. on all devices behind a specific bridge 219*3b9bae02SChangbin Du3. on a single device 220*3b9bae02SChangbin Du 221*3b9bae02SChangbin DuDisabling MSIs globally 222*3b9bae02SChangbin Du----------------------- 223*3b9bae02SChangbin Du 224*3b9bae02SChangbin DuSome host chipsets simply don't support MSIs properly. If we're 225*3b9bae02SChangbin Dulucky, the manufacturer knows this and has indicated it in the ACPI 226*3b9bae02SChangbin DuFADT table. In this case, Linux automatically disables MSIs. 227*3b9bae02SChangbin DuSome boards don't include this information in the table and so we have 228*3b9bae02SChangbin Duto detect them ourselves. The complete list of these is found near the 229*3b9bae02SChangbin Duquirk_disable_all_msi() function in drivers/pci/quirks.c. 230*3b9bae02SChangbin Du 231*3b9bae02SChangbin DuIf you have a board which has problems with MSIs, you can pass pci=nomsi 232*3b9bae02SChangbin Duon the kernel command line to disable MSIs on all devices. It would be 233*3b9bae02SChangbin Duin your best interests to report the problem to linux-pci@vger.kernel.org 234*3b9bae02SChangbin Duincluding a full 'lspci -v' so we can add the quirks to the kernel. 235*3b9bae02SChangbin Du 236*3b9bae02SChangbin DuDisabling MSIs below a bridge 237*3b9bae02SChangbin Du----------------------------- 238*3b9bae02SChangbin Du 239*3b9bae02SChangbin DuSome PCI bridges are not able to route MSIs between busses properly. 240*3b9bae02SChangbin DuIn this case, MSIs must be disabled on all devices behind the bridge. 241*3b9bae02SChangbin Du 242*3b9bae02SChangbin DuSome bridges allow you to enable MSIs by changing some bits in their 243*3b9bae02SChangbin DuPCI configuration space (especially the Hypertransport chipsets such 244*3b9bae02SChangbin Duas the nVidia nForce and Serverworks HT2000). As with host chipsets, 245*3b9bae02SChangbin DuLinux mostly knows about them and automatically enables MSIs if it can. 246*3b9bae02SChangbin DuIf you have a bridge unknown to Linux, you can enable 247*3b9bae02SChangbin DuMSIs in configuration space using whatever method you know works, then 248*3b9bae02SChangbin Duenable MSIs on that bridge by doing:: 249*3b9bae02SChangbin Du 250*3b9bae02SChangbin Du echo 1 > /sys/bus/pci/devices/$bridge/msi_bus 251*3b9bae02SChangbin Du 252*3b9bae02SChangbin Duwhere $bridge is the PCI address of the bridge you've enabled (eg 253*3b9bae02SChangbin Du0000:00:0e.0). 254*3b9bae02SChangbin Du 255*3b9bae02SChangbin DuTo disable MSIs, echo 0 instead of 1. Changing this value should be 256*3b9bae02SChangbin Dudone with caution as it could break interrupt handling for all devices 257*3b9bae02SChangbin Dubelow this bridge. 258*3b9bae02SChangbin Du 259*3b9bae02SChangbin DuAgain, please notify linux-pci@vger.kernel.org of any bridges that need 260*3b9bae02SChangbin Duspecial handling. 261*3b9bae02SChangbin Du 262*3b9bae02SChangbin DuDisabling MSIs on a single device 263*3b9bae02SChangbin Du--------------------------------- 264*3b9bae02SChangbin Du 265*3b9bae02SChangbin DuSome devices are known to have faulty MSI implementations. Usually this 266*3b9bae02SChangbin Duis handled in the individual device driver, but occasionally it's necessary 267*3b9bae02SChangbin Duto handle this with a quirk. Some drivers have an option to disable use 268*3b9bae02SChangbin Duof MSI. While this is a convenient workaround for the driver author, 269*3b9bae02SChangbin Duit is not good practice, and should not be emulated. 270*3b9bae02SChangbin Du 271*3b9bae02SChangbin DuFinding why MSIs are disabled on a device 272*3b9bae02SChangbin Du----------------------------------------- 273*3b9bae02SChangbin Du 274*3b9bae02SChangbin DuFrom the above three sections, you can see that there are many reasons 275*3b9bae02SChangbin Duwhy MSIs may not be enabled for a given device. Your first step should 276*3b9bae02SChangbin Dube to examine your dmesg carefully to determine whether MSIs are enabled 277*3b9bae02SChangbin Dufor your machine. You should also check your .config to be sure you 278*3b9bae02SChangbin Duhave enabled CONFIG_PCI_MSI. 279*3b9bae02SChangbin Du 280*3b9bae02SChangbin DuThen, 'lspci -t' gives the list of bridges above a device. Reading 281*3b9bae02SChangbin Du`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1) 282*3b9bae02SChangbin Duor disabled (0). If 0 is found in any of the msi_bus files belonging 283*3b9bae02SChangbin Duto bridges between the PCI root and the device, MSIs are disabled. 284*3b9bae02SChangbin Du 285*3b9bae02SChangbin DuIt is also worth checking the device driver to see whether it supports MSIs. 286*3b9bae02SChangbin DuFor example, it may contain calls to pci_irq_alloc_vectors() with the 287*3b9bae02SChangbin DuPCI_IRQ_MSI or PCI_IRQ_MSIX flags. 288