13b9bae02SChangbin Du.. SPDX-License-Identifier: GPL-2.0 23b9bae02SChangbin Du.. include:: <isonum.txt> 33b9bae02SChangbin Du 43b9bae02SChangbin Du========================== 53b9bae02SChangbin DuThe MSI Driver Guide HOWTO 63b9bae02SChangbin Du========================== 73b9bae02SChangbin Du 83b9bae02SChangbin Du:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox 93b9bae02SChangbin Du 103b9bae02SChangbin Du:Copyright: 2003, 2008 Intel Corporation 113b9bae02SChangbin Du 123b9bae02SChangbin DuAbout this guide 133b9bae02SChangbin Du================ 143b9bae02SChangbin Du 153b9bae02SChangbin DuThis guide describes the basics of Message Signaled Interrupts (MSIs), 163b9bae02SChangbin Duthe advantages of using MSI over traditional interrupt mechanisms, how 173b9bae02SChangbin Duto change your driver to use MSI or MSI-X and some basic diagnostics to 183b9bae02SChangbin Dutry if a device doesn't support MSIs. 193b9bae02SChangbin Du 203b9bae02SChangbin Du 213b9bae02SChangbin DuWhat are MSIs? 223b9bae02SChangbin Du============== 233b9bae02SChangbin Du 243b9bae02SChangbin DuA Message Signaled Interrupt is a write from the device to a special 253b9bae02SChangbin Duaddress which causes an interrupt to be received by the CPU. 263b9bae02SChangbin Du 273b9bae02SChangbin DuThe MSI capability was first specified in PCI 2.2 and was later enhanced 283b9bae02SChangbin Duin PCI 3.0 to allow each interrupt to be masked individually. The MSI-X 293b9bae02SChangbin Ducapability was also introduced with PCI 3.0. It supports more interrupts 303b9bae02SChangbin Duper device than MSI and allows interrupts to be independently configured. 313b9bae02SChangbin Du 323b9bae02SChangbin DuDevices may support both MSI and MSI-X, but only one can be enabled at 333b9bae02SChangbin Dua time. 343b9bae02SChangbin Du 353b9bae02SChangbin Du 363b9bae02SChangbin DuWhy use MSIs? 373b9bae02SChangbin Du============= 383b9bae02SChangbin Du 393b9bae02SChangbin DuThere are three reasons why using MSIs can give an advantage over 403b9bae02SChangbin Dutraditional pin-based interrupts. 413b9bae02SChangbin Du 423b9bae02SChangbin DuPin-based PCI interrupts are often shared amongst several devices. 433b9bae02SChangbin DuTo support this, the kernel must call each interrupt handler associated 443b9bae02SChangbin Duwith an interrupt, which leads to reduced performance for the system as 453b9bae02SChangbin Dua whole. MSIs are never shared, so this problem cannot arise. 463b9bae02SChangbin Du 473b9bae02SChangbin DuWhen a device writes data to memory, then raises a pin-based interrupt, 483b9bae02SChangbin Duit is possible that the interrupt may arrive before all the data has 493b9bae02SChangbin Duarrived in memory (this becomes more likely with devices behind PCI-PCI 503b9bae02SChangbin Dubridges). In order to ensure that all the data has arrived in memory, 513b9bae02SChangbin Duthe interrupt handler must read a register on the device which raised 523b9bae02SChangbin Duthe interrupt. PCI transaction ordering rules require that all the data 533b9bae02SChangbin Duarrive in memory before the value may be returned from the register. 543b9bae02SChangbin DuUsing MSIs avoids this problem as the interrupt-generating write cannot 553b9bae02SChangbin Dupass the data writes, so by the time the interrupt is raised, the driver 563b9bae02SChangbin Duknows that all the data has arrived in memory. 573b9bae02SChangbin Du 583b9bae02SChangbin DuPCI devices can only support a single pin-based interrupt per function. 593b9bae02SChangbin DuOften drivers have to query the device to find out what event has 603b9bae02SChangbin Duoccurred, slowing down interrupt handling for the common case. With 613b9bae02SChangbin DuMSIs, a device can support more interrupts, allowing each interrupt 623b9bae02SChangbin Duto be specialised to a different purpose. One possible design gives 633b9bae02SChangbin Duinfrequent conditions (such as errors) their own interrupt which allows 643b9bae02SChangbin Duthe driver to handle the normal interrupt handling path more efficiently. 653b9bae02SChangbin DuOther possible designs include giving one interrupt to each packet queue 663b9bae02SChangbin Duin a network card or each port in a storage controller. 673b9bae02SChangbin Du 683b9bae02SChangbin Du 693b9bae02SChangbin DuHow to use MSIs 703b9bae02SChangbin Du=============== 713b9bae02SChangbin Du 723b9bae02SChangbin DuPCI devices are initialised to use pin-based interrupts. The device 733b9bae02SChangbin Dudriver has to set up the device to use MSI or MSI-X. Not all machines 743b9bae02SChangbin Dusupport MSIs correctly, and for those machines, the APIs described below 753b9bae02SChangbin Duwill simply fail and the device will continue to use pin-based interrupts. 763b9bae02SChangbin Du 773b9bae02SChangbin DuInclude kernel support for MSIs 783b9bae02SChangbin Du------------------------------- 793b9bae02SChangbin Du 803b9bae02SChangbin DuTo support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI 813b9bae02SChangbin Duoption enabled. This option is only available on some architectures, 823b9bae02SChangbin Duand it may depend on some other options also being set. For example, 833b9bae02SChangbin Duon x86, you must also enable X86_UP_APIC or SMP in order to see the 843b9bae02SChangbin DuCONFIG_PCI_MSI option. 853b9bae02SChangbin Du 863b9bae02SChangbin DuUsing MSI 873b9bae02SChangbin Du--------- 883b9bae02SChangbin Du 893b9bae02SChangbin DuMost of the hard work is done for the driver in the PCI layer. The driver 903b9bae02SChangbin Dusimply has to request that the PCI layer set up the MSI capability for this 913b9bae02SChangbin Dudevice. 923b9bae02SChangbin Du 933b9bae02SChangbin DuTo automatically use MSI or MSI-X interrupt vectors, use the following 943b9bae02SChangbin Dufunction:: 953b9bae02SChangbin Du 963b9bae02SChangbin Du int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, 973b9bae02SChangbin Du unsigned int max_vecs, unsigned int flags); 983b9bae02SChangbin Du 993b9bae02SChangbin Duwhich allocates up to max_vecs interrupt vectors for a PCI device. It 1003b9bae02SChangbin Dureturns the number of vectors allocated or a negative error. If the device 1013b9bae02SChangbin Duhas a requirements for a minimum number of vectors the driver can pass a 1023b9bae02SChangbin Dumin_vecs argument set to this limit, and the PCI core will return -ENOSPC 1033b9bae02SChangbin Duif it can't meet the minimum number of vectors. 1043b9bae02SChangbin Du 1053b9bae02SChangbin DuThe flags argument is used to specify which type of interrupt can be used 1063b9bae02SChangbin Duby the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX). 1073b9bae02SChangbin DuA convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for 1083b9bae02SChangbin Duany possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set, 1093b9bae02SChangbin Dupci_alloc_irq_vectors() will spread the interrupts around the available CPUs. 1103b9bae02SChangbin Du 1113b9bae02SChangbin DuTo get the Linux IRQ numbers passed to request_irq() and free_irq() and the 1123b9bae02SChangbin Duvectors, use the following function:: 1133b9bae02SChangbin Du 1143b9bae02SChangbin Du int pci_irq_vector(struct pci_dev *dev, unsigned int nr); 1153b9bae02SChangbin Du 1163b9bae02SChangbin DuAny allocated resources should be freed before removing the device using 1173b9bae02SChangbin Duthe following function:: 1183b9bae02SChangbin Du 1193b9bae02SChangbin Du void pci_free_irq_vectors(struct pci_dev *dev); 1203b9bae02SChangbin Du 1213b9bae02SChangbin DuIf a device supports both MSI-X and MSI capabilities, this API will use the 1223b9bae02SChangbin DuMSI-X facilities in preference to the MSI facilities. MSI-X supports any 1233b9bae02SChangbin Dunumber of interrupts between 1 and 2048. In contrast, MSI is restricted to 1243b9bae02SChangbin Dua maximum of 32 interrupts (and must be a power of two). In addition, the 1253b9bae02SChangbin DuMSI interrupt vectors must be allocated consecutively, so the system might 1263b9bae02SChangbin Dunot be able to allocate as many vectors for MSI as it could for MSI-X. On 1273b9bae02SChangbin Dusome platforms, MSI interrupts must all be targeted at the same set of CPUs 1283b9bae02SChangbin Duwhereas MSI-X interrupts can all be targeted at different CPUs. 1293b9bae02SChangbin Du 1303b9bae02SChangbin DuIf a device supports neither MSI-X or MSI it will fall back to a single 1313b9bae02SChangbin Dulegacy IRQ vector. 1323b9bae02SChangbin Du 1333b9bae02SChangbin DuThe typical usage of MSI or MSI-X interrupts is to allocate as many vectors 1343b9bae02SChangbin Duas possible, likely up to the limit supported by the device. If nvec is 1353b9bae02SChangbin Dularger than the number supported by the device it will automatically be 1363b9bae02SChangbin Ducapped to the supported limit, so there is no need to query the number of 1373b9bae02SChangbin Duvectors supported beforehand:: 1383b9bae02SChangbin Du 1393b9bae02SChangbin Du nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES) 1403b9bae02SChangbin Du if (nvec < 0) 1413b9bae02SChangbin Du goto out_err; 1423b9bae02SChangbin Du 1433b9bae02SChangbin DuIf a driver is unable or unwilling to deal with a variable number of MSI 1443b9bae02SChangbin Duinterrupts it can request a particular number of interrupts by passing that 1453b9bae02SChangbin Dunumber to pci_alloc_irq_vectors() function as both 'min_vecs' and 1463b9bae02SChangbin Du'max_vecs' parameters:: 1473b9bae02SChangbin Du 1483b9bae02SChangbin Du ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES); 1493b9bae02SChangbin Du if (ret < 0) 1503b9bae02SChangbin Du goto out_err; 1513b9bae02SChangbin Du 1523b9bae02SChangbin DuThe most notorious example of the request type described above is enabling 1533b9bae02SChangbin Duthe single MSI mode for a device. It could be done by passing two 1s as 1543b9bae02SChangbin Du'min_vecs' and 'max_vecs':: 1553b9bae02SChangbin Du 1563b9bae02SChangbin Du ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES); 1573b9bae02SChangbin Du if (ret < 0) 1583b9bae02SChangbin Du goto out_err; 1593b9bae02SChangbin Du 1603b9bae02SChangbin DuSome devices might not support using legacy line interrupts, in which case 1613b9bae02SChangbin Duthe driver can specify that only MSI or MSI-X is acceptable:: 1623b9bae02SChangbin Du 1633b9bae02SChangbin Du nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX); 1643b9bae02SChangbin Du if (nvec < 0) 1653b9bae02SChangbin Du goto out_err; 1663b9bae02SChangbin Du 1673b9bae02SChangbin DuLegacy APIs 1683b9bae02SChangbin Du----------- 1693b9bae02SChangbin Du 1703b9bae02SChangbin DuThe following old APIs to enable and disable MSI or MSI-X interrupts should 1713b9bae02SChangbin Dunot be used in new code:: 1723b9bae02SChangbin Du 1733b9bae02SChangbin Du pci_enable_msi() /* deprecated */ 1743b9bae02SChangbin Du pci_disable_msi() /* deprecated */ 1753b9bae02SChangbin Du pci_enable_msix_range() /* deprecated */ 1763b9bae02SChangbin Du pci_enable_msix_exact() /* deprecated */ 1773b9bae02SChangbin Du pci_disable_msix() /* deprecated */ 1783b9bae02SChangbin Du 1793b9bae02SChangbin DuAdditionally there are APIs to provide the number of supported MSI or MSI-X 1803b9bae02SChangbin Duvectors: pci_msi_vec_count() and pci_msix_vec_count(). In general these 1813b9bae02SChangbin Dushould be avoided in favor of letting pci_alloc_irq_vectors() cap the 1823b9bae02SChangbin Dunumber of vectors. If you have a legitimate special use case for the count 1833b9bae02SChangbin Duof vectors we might have to revisit that decision and add a 1843b9bae02SChangbin Dupci_nr_irq_vectors() helper that handles MSI and MSI-X transparently. 1853b9bae02SChangbin Du 1863b9bae02SChangbin DuConsiderations when using MSIs 1873b9bae02SChangbin Du------------------------------ 1883b9bae02SChangbin Du 1893b9bae02SChangbin DuSpinlocks 1903b9bae02SChangbin Du~~~~~~~~~ 1913b9bae02SChangbin Du 1923b9bae02SChangbin DuMost device drivers have a per-device spinlock which is taken in the 1933b9bae02SChangbin Duinterrupt handler. With pin-based interrupts or a single MSI, it is not 1943b9bae02SChangbin Dunecessary to disable interrupts (Linux guarantees the same interrupt will 1953b9bae02SChangbin Dunot be re-entered). If a device uses multiple interrupts, the driver 1963b9bae02SChangbin Dumust disable interrupts while the lock is held. If the device sends 1973b9bae02SChangbin Dua different interrupt, the driver will deadlock trying to recursively 1983b9bae02SChangbin Duacquire the spinlock. Such deadlocks can be avoided by using 1993b9bae02SChangbin Duspin_lock_irqsave() or spin_lock_irq() which disable local interrupts 2003b9bae02SChangbin Duand acquire the lock (see Documentation/kernel-hacking/locking.rst). 2013b9bae02SChangbin Du 2023b9bae02SChangbin DuHow to tell whether MSI/MSI-X is enabled on a device 2033b9bae02SChangbin Du---------------------------------------------------- 2043b9bae02SChangbin Du 2053b9bae02SChangbin DuUsing 'lspci -v' (as root) may show some devices with "MSI", "Message 2063b9bae02SChangbin DuSignalled Interrupts" or "MSI-X" capabilities. Each of these capabilities 2073b9bae02SChangbin Duhas an 'Enable' flag which is followed with either "+" (enabled) 2083b9bae02SChangbin Duor "-" (disabled). 2093b9bae02SChangbin Du 2103b9bae02SChangbin Du 2113b9bae02SChangbin DuMSI quirks 2123b9bae02SChangbin Du========== 2133b9bae02SChangbin Du 2143b9bae02SChangbin DuSeveral PCI chipsets or devices are known not to support MSIs. 2153b9bae02SChangbin DuThe PCI stack provides three ways to disable MSIs: 2163b9bae02SChangbin Du 2173b9bae02SChangbin Du1. globally 2183b9bae02SChangbin Du2. on all devices behind a specific bridge 2193b9bae02SChangbin Du3. on a single device 2203b9bae02SChangbin Du 2213b9bae02SChangbin DuDisabling MSIs globally 2223b9bae02SChangbin Du----------------------- 2233b9bae02SChangbin Du 2243b9bae02SChangbin DuSome host chipsets simply don't support MSIs properly. If we're 2253b9bae02SChangbin Dulucky, the manufacturer knows this and has indicated it in the ACPI 2263b9bae02SChangbin DuFADT table. In this case, Linux automatically disables MSIs. 2273b9bae02SChangbin DuSome boards don't include this information in the table and so we have 2283b9bae02SChangbin Duto detect them ourselves. The complete list of these is found near the 2293b9bae02SChangbin Duquirk_disable_all_msi() function in drivers/pci/quirks.c. 2303b9bae02SChangbin Du 2313b9bae02SChangbin DuIf you have a board which has problems with MSIs, you can pass pci=nomsi 2323b9bae02SChangbin Duon the kernel command line to disable MSIs on all devices. It would be 2333b9bae02SChangbin Duin your best interests to report the problem to linux-pci@vger.kernel.org 2343b9bae02SChangbin Duincluding a full 'lspci -v' so we can add the quirks to the kernel. 2353b9bae02SChangbin Du 2363b9bae02SChangbin DuDisabling MSIs below a bridge 2373b9bae02SChangbin Du----------------------------- 2383b9bae02SChangbin Du 2393b9bae02SChangbin DuSome PCI bridges are not able to route MSIs between busses properly. 2403b9bae02SChangbin DuIn this case, MSIs must be disabled on all devices behind the bridge. 2413b9bae02SChangbin Du 2423b9bae02SChangbin DuSome bridges allow you to enable MSIs by changing some bits in their 2433b9bae02SChangbin DuPCI configuration space (especially the Hypertransport chipsets such 2443b9bae02SChangbin Duas the nVidia nForce and Serverworks HT2000). As with host chipsets, 2453b9bae02SChangbin DuLinux mostly knows about them and automatically enables MSIs if it can. 2463b9bae02SChangbin DuIf you have a bridge unknown to Linux, you can enable 2473b9bae02SChangbin DuMSIs in configuration space using whatever method you know works, then 2483b9bae02SChangbin Duenable MSIs on that bridge by doing:: 2493b9bae02SChangbin Du 2503b9bae02SChangbin Du echo 1 > /sys/bus/pci/devices/$bridge/msi_bus 2513b9bae02SChangbin Du 2523b9bae02SChangbin Duwhere $bridge is the PCI address of the bridge you've enabled (eg 2533b9bae02SChangbin Du0000:00:0e.0). 2543b9bae02SChangbin Du 2553b9bae02SChangbin DuTo disable MSIs, echo 0 instead of 1. Changing this value should be 2563b9bae02SChangbin Dudone with caution as it could break interrupt handling for all devices 2573b9bae02SChangbin Dubelow this bridge. 2583b9bae02SChangbin Du 2593b9bae02SChangbin DuAgain, please notify linux-pci@vger.kernel.org of any bridges that need 2603b9bae02SChangbin Duspecial handling. 2613b9bae02SChangbin Du 2623b9bae02SChangbin DuDisabling MSIs on a single device 2633b9bae02SChangbin Du--------------------------------- 2643b9bae02SChangbin Du 2653b9bae02SChangbin DuSome devices are known to have faulty MSI implementations. Usually this 2663b9bae02SChangbin Duis handled in the individual device driver, but occasionally it's necessary 2673b9bae02SChangbin Duto handle this with a quirk. Some drivers have an option to disable use 2683b9bae02SChangbin Duof MSI. While this is a convenient workaround for the driver author, 2693b9bae02SChangbin Duit is not good practice, and should not be emulated. 2703b9bae02SChangbin Du 2713b9bae02SChangbin DuFinding why MSIs are disabled on a device 2723b9bae02SChangbin Du----------------------------------------- 2733b9bae02SChangbin Du 2743b9bae02SChangbin DuFrom the above three sections, you can see that there are many reasons 2753b9bae02SChangbin Duwhy MSIs may not be enabled for a given device. Your first step should 2763b9bae02SChangbin Dube to examine your dmesg carefully to determine whether MSIs are enabled 2773b9bae02SChangbin Dufor your machine. You should also check your .config to be sure you 2783b9bae02SChangbin Duhave enabled CONFIG_PCI_MSI. 2793b9bae02SChangbin Du 2803b9bae02SChangbin DuThen, 'lspci -t' gives the list of bridges above a device. Reading 2813b9bae02SChangbin Du`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1) 2823b9bae02SChangbin Duor disabled (0). If 0 is found in any of the msi_bus files belonging 2833b9bae02SChangbin Duto bridges between the PCI root and the device, MSIs are disabled. 2843b9bae02SChangbin Du 2853b9bae02SChangbin DuIt is also worth checking the device driver to see whether it supports MSIs. 2867730c3beSZenghui YuFor example, it may contain calls to pci_alloc_irq_vectors() with the 2873b9bae02SChangbin DuPCI_IRQ_MSI or PCI_IRQ_MSIX flags. 288*88614075SAhmed S. Darwish 289*88614075SAhmed S. Darwish 290*88614075SAhmed S. DarwishList of device drivers MSI(-X) APIs 291*88614075SAhmed S. Darwish=================================== 292*88614075SAhmed S. Darwish 293*88614075SAhmed S. DarwishThe PCI/MSI subystem has a dedicated C file for its exported device driver 294*88614075SAhmed S. DarwishAPIs — `drivers/pci/msi/api.c`. The following functions are exported: 295*88614075SAhmed S. Darwish 296*88614075SAhmed S. Darwish.. kernel-doc:: drivers/pci/msi/api.c 297*88614075SAhmed S. Darwish :export: 298