xref: /openbmc/linux/Documentation/PCI/msi-howto.rst (revision 2612e3bbc0386368a850140a6c9b990cd496a5ec)
13b9bae02SChangbin Du.. SPDX-License-Identifier: GPL-2.0
23b9bae02SChangbin Du.. include:: <isonum.txt>
33b9bae02SChangbin Du
43b9bae02SChangbin Du==========================
53b9bae02SChangbin DuThe MSI Driver Guide HOWTO
63b9bae02SChangbin Du==========================
73b9bae02SChangbin Du
83b9bae02SChangbin Du:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
93b9bae02SChangbin Du
103b9bae02SChangbin Du:Copyright: 2003, 2008 Intel Corporation
113b9bae02SChangbin Du
123b9bae02SChangbin DuAbout this guide
133b9bae02SChangbin Du================
143b9bae02SChangbin Du
153b9bae02SChangbin DuThis guide describes the basics of Message Signaled Interrupts (MSIs),
163b9bae02SChangbin Duthe advantages of using MSI over traditional interrupt mechanisms, how
173b9bae02SChangbin Duto change your driver to use MSI or MSI-X and some basic diagnostics to
183b9bae02SChangbin Dutry if a device doesn't support MSIs.
193b9bae02SChangbin Du
203b9bae02SChangbin Du
213b9bae02SChangbin DuWhat are MSIs?
223b9bae02SChangbin Du==============
233b9bae02SChangbin Du
243b9bae02SChangbin DuA Message Signaled Interrupt is a write from the device to a special
253b9bae02SChangbin Duaddress which causes an interrupt to be received by the CPU.
263b9bae02SChangbin Du
273b9bae02SChangbin DuThe MSI capability was first specified in PCI 2.2 and was later enhanced
283b9bae02SChangbin Duin PCI 3.0 to allow each interrupt to be masked individually.  The MSI-X
293b9bae02SChangbin Ducapability was also introduced with PCI 3.0.  It supports more interrupts
303b9bae02SChangbin Duper device than MSI and allows interrupts to be independently configured.
313b9bae02SChangbin Du
323b9bae02SChangbin DuDevices may support both MSI and MSI-X, but only one can be enabled at
333b9bae02SChangbin Dua time.
343b9bae02SChangbin Du
353b9bae02SChangbin Du
363b9bae02SChangbin DuWhy use MSIs?
373b9bae02SChangbin Du=============
383b9bae02SChangbin Du
393b9bae02SChangbin DuThere are three reasons why using MSIs can give an advantage over
403b9bae02SChangbin Dutraditional pin-based interrupts.
413b9bae02SChangbin Du
423b9bae02SChangbin DuPin-based PCI interrupts are often shared amongst several devices.
433b9bae02SChangbin DuTo support this, the kernel must call each interrupt handler associated
443b9bae02SChangbin Duwith an interrupt, which leads to reduced performance for the system as
453b9bae02SChangbin Dua whole.  MSIs are never shared, so this problem cannot arise.
463b9bae02SChangbin Du
473b9bae02SChangbin DuWhen a device writes data to memory, then raises a pin-based interrupt,
483b9bae02SChangbin Duit is possible that the interrupt may arrive before all the data has
493b9bae02SChangbin Duarrived in memory (this becomes more likely with devices behind PCI-PCI
503b9bae02SChangbin Dubridges).  In order to ensure that all the data has arrived in memory,
513b9bae02SChangbin Duthe interrupt handler must read a register on the device which raised
523b9bae02SChangbin Duthe interrupt.  PCI transaction ordering rules require that all the data
533b9bae02SChangbin Duarrive in memory before the value may be returned from the register.
543b9bae02SChangbin DuUsing MSIs avoids this problem as the interrupt-generating write cannot
553b9bae02SChangbin Dupass the data writes, so by the time the interrupt is raised, the driver
563b9bae02SChangbin Duknows that all the data has arrived in memory.
573b9bae02SChangbin Du
583b9bae02SChangbin DuPCI devices can only support a single pin-based interrupt per function.
593b9bae02SChangbin DuOften drivers have to query the device to find out what event has
603b9bae02SChangbin Duoccurred, slowing down interrupt handling for the common case.  With
613b9bae02SChangbin DuMSIs, a device can support more interrupts, allowing each interrupt
623b9bae02SChangbin Duto be specialised to a different purpose.  One possible design gives
633b9bae02SChangbin Duinfrequent conditions (such as errors) their own interrupt which allows
643b9bae02SChangbin Duthe driver to handle the normal interrupt handling path more efficiently.
653b9bae02SChangbin DuOther possible designs include giving one interrupt to each packet queue
663b9bae02SChangbin Duin a network card or each port in a storage controller.
673b9bae02SChangbin Du
683b9bae02SChangbin Du
693b9bae02SChangbin DuHow to use MSIs
703b9bae02SChangbin Du===============
713b9bae02SChangbin Du
723b9bae02SChangbin DuPCI devices are initialised to use pin-based interrupts.  The device
733b9bae02SChangbin Dudriver has to set up the device to use MSI or MSI-X.  Not all machines
743b9bae02SChangbin Dusupport MSIs correctly, and for those machines, the APIs described below
753b9bae02SChangbin Duwill simply fail and the device will continue to use pin-based interrupts.
763b9bae02SChangbin Du
773b9bae02SChangbin DuInclude kernel support for MSIs
783b9bae02SChangbin Du-------------------------------
793b9bae02SChangbin Du
803b9bae02SChangbin DuTo support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
813b9bae02SChangbin Duoption enabled.  This option is only available on some architectures,
823b9bae02SChangbin Duand it may depend on some other options also being set.  For example,
833b9bae02SChangbin Duon x86, you must also enable X86_UP_APIC or SMP in order to see the
843b9bae02SChangbin DuCONFIG_PCI_MSI option.
853b9bae02SChangbin Du
863b9bae02SChangbin DuUsing MSI
873b9bae02SChangbin Du---------
883b9bae02SChangbin Du
893b9bae02SChangbin DuMost of the hard work is done for the driver in the PCI layer.  The driver
903b9bae02SChangbin Dusimply has to request that the PCI layer set up the MSI capability for this
913b9bae02SChangbin Dudevice.
923b9bae02SChangbin Du
933b9bae02SChangbin DuTo automatically use MSI or MSI-X interrupt vectors, use the following
943b9bae02SChangbin Dufunction::
953b9bae02SChangbin Du
963b9bae02SChangbin Du  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
973b9bae02SChangbin Du		unsigned int max_vecs, unsigned int flags);
983b9bae02SChangbin Du
993b9bae02SChangbin Duwhich allocates up to max_vecs interrupt vectors for a PCI device.  It
1003b9bae02SChangbin Dureturns the number of vectors allocated or a negative error.  If the device
1013b9bae02SChangbin Duhas a requirements for a minimum number of vectors the driver can pass a
1023b9bae02SChangbin Dumin_vecs argument set to this limit, and the PCI core will return -ENOSPC
1033b9bae02SChangbin Duif it can't meet the minimum number of vectors.
1043b9bae02SChangbin Du
1053b9bae02SChangbin DuThe flags argument is used to specify which type of interrupt can be used
1063b9bae02SChangbin Duby the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX).
1073b9bae02SChangbin DuA convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for
1083b9bae02SChangbin Duany possible kind of interrupt.  If the PCI_IRQ_AFFINITY flag is set,
1093b9bae02SChangbin Dupci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
1103b9bae02SChangbin Du
1113b9bae02SChangbin DuTo get the Linux IRQ numbers passed to request_irq() and free_irq() and the
1123b9bae02SChangbin Duvectors, use the following function::
1133b9bae02SChangbin Du
1143b9bae02SChangbin Du  int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
1153b9bae02SChangbin Du
1163b9bae02SChangbin DuAny allocated resources should be freed before removing the device using
1173b9bae02SChangbin Duthe following function::
1183b9bae02SChangbin Du
1193b9bae02SChangbin Du  void pci_free_irq_vectors(struct pci_dev *dev);
1203b9bae02SChangbin Du
1213b9bae02SChangbin DuIf a device supports both MSI-X and MSI capabilities, this API will use the
1223b9bae02SChangbin DuMSI-X facilities in preference to the MSI facilities.  MSI-X supports any
1233b9bae02SChangbin Dunumber of interrupts between 1 and 2048.  In contrast, MSI is restricted to
1243b9bae02SChangbin Dua maximum of 32 interrupts (and must be a power of two).  In addition, the
1253b9bae02SChangbin DuMSI interrupt vectors must be allocated consecutively, so the system might
1263b9bae02SChangbin Dunot be able to allocate as many vectors for MSI as it could for MSI-X.  On
1273b9bae02SChangbin Dusome platforms, MSI interrupts must all be targeted at the same set of CPUs
1283b9bae02SChangbin Duwhereas MSI-X interrupts can all be targeted at different CPUs.
1293b9bae02SChangbin Du
1303b9bae02SChangbin DuIf a device supports neither MSI-X or MSI it will fall back to a single
1313b9bae02SChangbin Dulegacy IRQ vector.
1323b9bae02SChangbin Du
1333b9bae02SChangbin DuThe typical usage of MSI or MSI-X interrupts is to allocate as many vectors
1343b9bae02SChangbin Duas possible, likely up to the limit supported by the device.  If nvec is
1353b9bae02SChangbin Dularger than the number supported by the device it will automatically be
1363b9bae02SChangbin Ducapped to the supported limit, so there is no need to query the number of
1373b9bae02SChangbin Duvectors supported beforehand::
1383b9bae02SChangbin Du
1393b9bae02SChangbin Du	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
1403b9bae02SChangbin Du	if (nvec < 0)
1413b9bae02SChangbin Du		goto out_err;
1423b9bae02SChangbin Du
1433b9bae02SChangbin DuIf a driver is unable or unwilling to deal with a variable number of MSI
1443b9bae02SChangbin Duinterrupts it can request a particular number of interrupts by passing that
1453b9bae02SChangbin Dunumber to pci_alloc_irq_vectors() function as both 'min_vecs' and
1463b9bae02SChangbin Du'max_vecs' parameters::
1473b9bae02SChangbin Du
1483b9bae02SChangbin Du	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
1493b9bae02SChangbin Du	if (ret < 0)
1503b9bae02SChangbin Du		goto out_err;
1513b9bae02SChangbin Du
1523b9bae02SChangbin DuThe most notorious example of the request type described above is enabling
1533b9bae02SChangbin Duthe single MSI mode for a device.  It could be done by passing two 1s as
1543b9bae02SChangbin Du'min_vecs' and 'max_vecs'::
1553b9bae02SChangbin Du
1563b9bae02SChangbin Du	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
1573b9bae02SChangbin Du	if (ret < 0)
1583b9bae02SChangbin Du		goto out_err;
1593b9bae02SChangbin Du
1603b9bae02SChangbin DuSome devices might not support using legacy line interrupts, in which case
1613b9bae02SChangbin Duthe driver can specify that only MSI or MSI-X is acceptable::
1623b9bae02SChangbin Du
1633b9bae02SChangbin Du	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
1643b9bae02SChangbin Du	if (nvec < 0)
1653b9bae02SChangbin Du		goto out_err;
1663b9bae02SChangbin Du
1673b9bae02SChangbin DuLegacy APIs
1683b9bae02SChangbin Du-----------
1693b9bae02SChangbin Du
1703b9bae02SChangbin DuThe following old APIs to enable and disable MSI or MSI-X interrupts should
1713b9bae02SChangbin Dunot be used in new code::
1723b9bae02SChangbin Du
1733b9bae02SChangbin Du  pci_enable_msi()		/* deprecated */
1743b9bae02SChangbin Du  pci_disable_msi()		/* deprecated */
1753b9bae02SChangbin Du  pci_enable_msix_range()	/* deprecated */
1763b9bae02SChangbin Du  pci_enable_msix_exact()	/* deprecated */
1773b9bae02SChangbin Du  pci_disable_msix()		/* deprecated */
1783b9bae02SChangbin Du
1793b9bae02SChangbin DuAdditionally there are APIs to provide the number of supported MSI or MSI-X
1803b9bae02SChangbin Duvectors: pci_msi_vec_count() and pci_msix_vec_count().  In general these
1813b9bae02SChangbin Dushould be avoided in favor of letting pci_alloc_irq_vectors() cap the
1823b9bae02SChangbin Dunumber of vectors.  If you have a legitimate special use case for the count
1833b9bae02SChangbin Duof vectors we might have to revisit that decision and add a
1843b9bae02SChangbin Dupci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
1853b9bae02SChangbin Du
1863b9bae02SChangbin DuConsiderations when using MSIs
1873b9bae02SChangbin Du------------------------------
1883b9bae02SChangbin Du
1893b9bae02SChangbin DuSpinlocks
1903b9bae02SChangbin Du~~~~~~~~~
1913b9bae02SChangbin Du
1923b9bae02SChangbin DuMost device drivers have a per-device spinlock which is taken in the
1933b9bae02SChangbin Duinterrupt handler.  With pin-based interrupts or a single MSI, it is not
1943b9bae02SChangbin Dunecessary to disable interrupts (Linux guarantees the same interrupt will
1953b9bae02SChangbin Dunot be re-entered).  If a device uses multiple interrupts, the driver
1963b9bae02SChangbin Dumust disable interrupts while the lock is held.  If the device sends
1973b9bae02SChangbin Dua different interrupt, the driver will deadlock trying to recursively
1983b9bae02SChangbin Duacquire the spinlock.  Such deadlocks can be avoided by using
1993b9bae02SChangbin Duspin_lock_irqsave() or spin_lock_irq() which disable local interrupts
2003b9bae02SChangbin Duand acquire the lock (see Documentation/kernel-hacking/locking.rst).
2013b9bae02SChangbin Du
2023b9bae02SChangbin DuHow to tell whether MSI/MSI-X is enabled on a device
2033b9bae02SChangbin Du----------------------------------------------------
2043b9bae02SChangbin Du
2053b9bae02SChangbin DuUsing 'lspci -v' (as root) may show some devices with "MSI", "Message
2063b9bae02SChangbin DuSignalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities
2073b9bae02SChangbin Duhas an 'Enable' flag which is followed with either "+" (enabled)
2083b9bae02SChangbin Duor "-" (disabled).
2093b9bae02SChangbin Du
2103b9bae02SChangbin Du
2113b9bae02SChangbin DuMSI quirks
2123b9bae02SChangbin Du==========
2133b9bae02SChangbin Du
2143b9bae02SChangbin DuSeveral PCI chipsets or devices are known not to support MSIs.
2153b9bae02SChangbin DuThe PCI stack provides three ways to disable MSIs:
2163b9bae02SChangbin Du
2173b9bae02SChangbin Du1. globally
2183b9bae02SChangbin Du2. on all devices behind a specific bridge
2193b9bae02SChangbin Du3. on a single device
2203b9bae02SChangbin Du
2213b9bae02SChangbin DuDisabling MSIs globally
2223b9bae02SChangbin Du-----------------------
2233b9bae02SChangbin Du
2243b9bae02SChangbin DuSome host chipsets simply don't support MSIs properly.  If we're
2253b9bae02SChangbin Dulucky, the manufacturer knows this and has indicated it in the ACPI
2263b9bae02SChangbin DuFADT table.  In this case, Linux automatically disables MSIs.
2273b9bae02SChangbin DuSome boards don't include this information in the table and so we have
2283b9bae02SChangbin Duto detect them ourselves.  The complete list of these is found near the
2293b9bae02SChangbin Duquirk_disable_all_msi() function in drivers/pci/quirks.c.
2303b9bae02SChangbin Du
2313b9bae02SChangbin DuIf you have a board which has problems with MSIs, you can pass pci=nomsi
2323b9bae02SChangbin Duon the kernel command line to disable MSIs on all devices.  It would be
2333b9bae02SChangbin Duin your best interests to report the problem to linux-pci@vger.kernel.org
2343b9bae02SChangbin Duincluding a full 'lspci -v' so we can add the quirks to the kernel.
2353b9bae02SChangbin Du
2363b9bae02SChangbin DuDisabling MSIs below a bridge
2373b9bae02SChangbin Du-----------------------------
2383b9bae02SChangbin Du
2393b9bae02SChangbin DuSome PCI bridges are not able to route MSIs between busses properly.
2403b9bae02SChangbin DuIn this case, MSIs must be disabled on all devices behind the bridge.
2413b9bae02SChangbin Du
2423b9bae02SChangbin DuSome bridges allow you to enable MSIs by changing some bits in their
2433b9bae02SChangbin DuPCI configuration space (especially the Hypertransport chipsets such
2443b9bae02SChangbin Duas the nVidia nForce and Serverworks HT2000).  As with host chipsets,
2453b9bae02SChangbin DuLinux mostly knows about them and automatically enables MSIs if it can.
2463b9bae02SChangbin DuIf you have a bridge unknown to Linux, you can enable
2473b9bae02SChangbin DuMSIs in configuration space using whatever method you know works, then
2483b9bae02SChangbin Duenable MSIs on that bridge by doing::
2493b9bae02SChangbin Du
2503b9bae02SChangbin Du       echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
2513b9bae02SChangbin Du
2523b9bae02SChangbin Duwhere $bridge is the PCI address of the bridge you've enabled (eg
2533b9bae02SChangbin Du0000:00:0e.0).
2543b9bae02SChangbin Du
2553b9bae02SChangbin DuTo disable MSIs, echo 0 instead of 1.  Changing this value should be
2563b9bae02SChangbin Dudone with caution as it could break interrupt handling for all devices
2573b9bae02SChangbin Dubelow this bridge.
2583b9bae02SChangbin Du
2593b9bae02SChangbin DuAgain, please notify linux-pci@vger.kernel.org of any bridges that need
2603b9bae02SChangbin Duspecial handling.
2613b9bae02SChangbin Du
2623b9bae02SChangbin DuDisabling MSIs on a single device
2633b9bae02SChangbin Du---------------------------------
2643b9bae02SChangbin Du
2653b9bae02SChangbin DuSome devices are known to have faulty MSI implementations.  Usually this
2663b9bae02SChangbin Duis handled in the individual device driver, but occasionally it's necessary
2673b9bae02SChangbin Duto handle this with a quirk.  Some drivers have an option to disable use
2683b9bae02SChangbin Duof MSI.  While this is a convenient workaround for the driver author,
2693b9bae02SChangbin Duit is not good practice, and should not be emulated.
2703b9bae02SChangbin Du
2713b9bae02SChangbin DuFinding why MSIs are disabled on a device
2723b9bae02SChangbin Du-----------------------------------------
2733b9bae02SChangbin Du
2743b9bae02SChangbin DuFrom the above three sections, you can see that there are many reasons
2753b9bae02SChangbin Duwhy MSIs may not be enabled for a given device.  Your first step should
2763b9bae02SChangbin Dube to examine your dmesg carefully to determine whether MSIs are enabled
2773b9bae02SChangbin Dufor your machine.  You should also check your .config to be sure you
2783b9bae02SChangbin Duhave enabled CONFIG_PCI_MSI.
2793b9bae02SChangbin Du
2803b9bae02SChangbin DuThen, 'lspci -t' gives the list of bridges above a device. Reading
2813b9bae02SChangbin Du`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
2823b9bae02SChangbin Duor disabled (0).  If 0 is found in any of the msi_bus files belonging
2833b9bae02SChangbin Duto bridges between the PCI root and the device, MSIs are disabled.
2843b9bae02SChangbin Du
2853b9bae02SChangbin DuIt is also worth checking the device driver to see whether it supports MSIs.
2867730c3beSZenghui YuFor example, it may contain calls to pci_alloc_irq_vectors() with the
2873b9bae02SChangbin DuPCI_IRQ_MSI or PCI_IRQ_MSIX flags.
28888614075SAhmed S. Darwish
28988614075SAhmed S. Darwish
29088614075SAhmed S. DarwishList of device drivers MSI(-X) APIs
29188614075SAhmed S. Darwish===================================
29288614075SAhmed S. Darwish
293*b58d6d89SRandy DunlapThe PCI/MSI subsystem has a dedicated C file for its exported device driver
29488614075SAhmed S. DarwishAPIs — `drivers/pci/msi/api.c`. The following functions are exported:
29588614075SAhmed S. Darwish
29688614075SAhmed S. Darwish.. kernel-doc:: drivers/pci/msi/api.c
29788614075SAhmed S. Darwish   :export:
298