xref: /openbmc/linux/Documentation/PCI/msi-howto.rst (revision 3b9bae029b60ee0fa6d6205e0debfad4482434a7)
1*3b9bae02SChangbin Du.. SPDX-License-Identifier: GPL-2.0
2*3b9bae02SChangbin Du.. include:: <isonum.txt>
3*3b9bae02SChangbin Du
4*3b9bae02SChangbin Du==========================
5*3b9bae02SChangbin DuThe MSI Driver Guide HOWTO
6*3b9bae02SChangbin Du==========================
7*3b9bae02SChangbin Du
8*3b9bae02SChangbin Du:Authors: Tom L Nguyen; Martine Silbermann; Matthew Wilcox
9*3b9bae02SChangbin Du
10*3b9bae02SChangbin Du:Copyright: 2003, 2008 Intel Corporation
11*3b9bae02SChangbin Du
12*3b9bae02SChangbin DuAbout this guide
13*3b9bae02SChangbin Du================
14*3b9bae02SChangbin Du
15*3b9bae02SChangbin DuThis guide describes the basics of Message Signaled Interrupts (MSIs),
16*3b9bae02SChangbin Duthe advantages of using MSI over traditional interrupt mechanisms, how
17*3b9bae02SChangbin Duto change your driver to use MSI or MSI-X and some basic diagnostics to
18*3b9bae02SChangbin Dutry if a device doesn't support MSIs.
19*3b9bae02SChangbin Du
20*3b9bae02SChangbin Du
21*3b9bae02SChangbin DuWhat are MSIs?
22*3b9bae02SChangbin Du==============
23*3b9bae02SChangbin Du
24*3b9bae02SChangbin DuA Message Signaled Interrupt is a write from the device to a special
25*3b9bae02SChangbin Duaddress which causes an interrupt to be received by the CPU.
26*3b9bae02SChangbin Du
27*3b9bae02SChangbin DuThe MSI capability was first specified in PCI 2.2 and was later enhanced
28*3b9bae02SChangbin Duin PCI 3.0 to allow each interrupt to be masked individually.  The MSI-X
29*3b9bae02SChangbin Ducapability was also introduced with PCI 3.0.  It supports more interrupts
30*3b9bae02SChangbin Duper device than MSI and allows interrupts to be independently configured.
31*3b9bae02SChangbin Du
32*3b9bae02SChangbin DuDevices may support both MSI and MSI-X, but only one can be enabled at
33*3b9bae02SChangbin Dua time.
34*3b9bae02SChangbin Du
35*3b9bae02SChangbin Du
36*3b9bae02SChangbin DuWhy use MSIs?
37*3b9bae02SChangbin Du=============
38*3b9bae02SChangbin Du
39*3b9bae02SChangbin DuThere are three reasons why using MSIs can give an advantage over
40*3b9bae02SChangbin Dutraditional pin-based interrupts.
41*3b9bae02SChangbin Du
42*3b9bae02SChangbin DuPin-based PCI interrupts are often shared amongst several devices.
43*3b9bae02SChangbin DuTo support this, the kernel must call each interrupt handler associated
44*3b9bae02SChangbin Duwith an interrupt, which leads to reduced performance for the system as
45*3b9bae02SChangbin Dua whole.  MSIs are never shared, so this problem cannot arise.
46*3b9bae02SChangbin Du
47*3b9bae02SChangbin DuWhen a device writes data to memory, then raises a pin-based interrupt,
48*3b9bae02SChangbin Duit is possible that the interrupt may arrive before all the data has
49*3b9bae02SChangbin Duarrived in memory (this becomes more likely with devices behind PCI-PCI
50*3b9bae02SChangbin Dubridges).  In order to ensure that all the data has arrived in memory,
51*3b9bae02SChangbin Duthe interrupt handler must read a register on the device which raised
52*3b9bae02SChangbin Duthe interrupt.  PCI transaction ordering rules require that all the data
53*3b9bae02SChangbin Duarrive in memory before the value may be returned from the register.
54*3b9bae02SChangbin DuUsing MSIs avoids this problem as the interrupt-generating write cannot
55*3b9bae02SChangbin Dupass the data writes, so by the time the interrupt is raised, the driver
56*3b9bae02SChangbin Duknows that all the data has arrived in memory.
57*3b9bae02SChangbin Du
58*3b9bae02SChangbin DuPCI devices can only support a single pin-based interrupt per function.
59*3b9bae02SChangbin DuOften drivers have to query the device to find out what event has
60*3b9bae02SChangbin Duoccurred, slowing down interrupt handling for the common case.  With
61*3b9bae02SChangbin DuMSIs, a device can support more interrupts, allowing each interrupt
62*3b9bae02SChangbin Duto be specialised to a different purpose.  One possible design gives
63*3b9bae02SChangbin Duinfrequent conditions (such as errors) their own interrupt which allows
64*3b9bae02SChangbin Duthe driver to handle the normal interrupt handling path more efficiently.
65*3b9bae02SChangbin DuOther possible designs include giving one interrupt to each packet queue
66*3b9bae02SChangbin Duin a network card or each port in a storage controller.
67*3b9bae02SChangbin Du
68*3b9bae02SChangbin Du
69*3b9bae02SChangbin DuHow to use MSIs
70*3b9bae02SChangbin Du===============
71*3b9bae02SChangbin Du
72*3b9bae02SChangbin DuPCI devices are initialised to use pin-based interrupts.  The device
73*3b9bae02SChangbin Dudriver has to set up the device to use MSI or MSI-X.  Not all machines
74*3b9bae02SChangbin Dusupport MSIs correctly, and for those machines, the APIs described below
75*3b9bae02SChangbin Duwill simply fail and the device will continue to use pin-based interrupts.
76*3b9bae02SChangbin Du
77*3b9bae02SChangbin DuInclude kernel support for MSIs
78*3b9bae02SChangbin Du-------------------------------
79*3b9bae02SChangbin Du
80*3b9bae02SChangbin DuTo support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
81*3b9bae02SChangbin Duoption enabled.  This option is only available on some architectures,
82*3b9bae02SChangbin Duand it may depend on some other options also being set.  For example,
83*3b9bae02SChangbin Duon x86, you must also enable X86_UP_APIC or SMP in order to see the
84*3b9bae02SChangbin DuCONFIG_PCI_MSI option.
85*3b9bae02SChangbin Du
86*3b9bae02SChangbin DuUsing MSI
87*3b9bae02SChangbin Du---------
88*3b9bae02SChangbin Du
89*3b9bae02SChangbin DuMost of the hard work is done for the driver in the PCI layer.  The driver
90*3b9bae02SChangbin Dusimply has to request that the PCI layer set up the MSI capability for this
91*3b9bae02SChangbin Dudevice.
92*3b9bae02SChangbin Du
93*3b9bae02SChangbin DuTo automatically use MSI or MSI-X interrupt vectors, use the following
94*3b9bae02SChangbin Dufunction::
95*3b9bae02SChangbin Du
96*3b9bae02SChangbin Du  int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
97*3b9bae02SChangbin Du		unsigned int max_vecs, unsigned int flags);
98*3b9bae02SChangbin Du
99*3b9bae02SChangbin Duwhich allocates up to max_vecs interrupt vectors for a PCI device.  It
100*3b9bae02SChangbin Dureturns the number of vectors allocated or a negative error.  If the device
101*3b9bae02SChangbin Duhas a requirements for a minimum number of vectors the driver can pass a
102*3b9bae02SChangbin Dumin_vecs argument set to this limit, and the PCI core will return -ENOSPC
103*3b9bae02SChangbin Duif it can't meet the minimum number of vectors.
104*3b9bae02SChangbin Du
105*3b9bae02SChangbin DuThe flags argument is used to specify which type of interrupt can be used
106*3b9bae02SChangbin Duby the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX).
107*3b9bae02SChangbin DuA convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for
108*3b9bae02SChangbin Duany possible kind of interrupt.  If the PCI_IRQ_AFFINITY flag is set,
109*3b9bae02SChangbin Dupci_alloc_irq_vectors() will spread the interrupts around the available CPUs.
110*3b9bae02SChangbin Du
111*3b9bae02SChangbin DuTo get the Linux IRQ numbers passed to request_irq() and free_irq() and the
112*3b9bae02SChangbin Duvectors, use the following function::
113*3b9bae02SChangbin Du
114*3b9bae02SChangbin Du  int pci_irq_vector(struct pci_dev *dev, unsigned int nr);
115*3b9bae02SChangbin Du
116*3b9bae02SChangbin DuAny allocated resources should be freed before removing the device using
117*3b9bae02SChangbin Duthe following function::
118*3b9bae02SChangbin Du
119*3b9bae02SChangbin Du  void pci_free_irq_vectors(struct pci_dev *dev);
120*3b9bae02SChangbin Du
121*3b9bae02SChangbin DuIf a device supports both MSI-X and MSI capabilities, this API will use the
122*3b9bae02SChangbin DuMSI-X facilities in preference to the MSI facilities.  MSI-X supports any
123*3b9bae02SChangbin Dunumber of interrupts between 1 and 2048.  In contrast, MSI is restricted to
124*3b9bae02SChangbin Dua maximum of 32 interrupts (and must be a power of two).  In addition, the
125*3b9bae02SChangbin DuMSI interrupt vectors must be allocated consecutively, so the system might
126*3b9bae02SChangbin Dunot be able to allocate as many vectors for MSI as it could for MSI-X.  On
127*3b9bae02SChangbin Dusome platforms, MSI interrupts must all be targeted at the same set of CPUs
128*3b9bae02SChangbin Duwhereas MSI-X interrupts can all be targeted at different CPUs.
129*3b9bae02SChangbin Du
130*3b9bae02SChangbin DuIf a device supports neither MSI-X or MSI it will fall back to a single
131*3b9bae02SChangbin Dulegacy IRQ vector.
132*3b9bae02SChangbin Du
133*3b9bae02SChangbin DuThe typical usage of MSI or MSI-X interrupts is to allocate as many vectors
134*3b9bae02SChangbin Duas possible, likely up to the limit supported by the device.  If nvec is
135*3b9bae02SChangbin Dularger than the number supported by the device it will automatically be
136*3b9bae02SChangbin Ducapped to the supported limit, so there is no need to query the number of
137*3b9bae02SChangbin Duvectors supported beforehand::
138*3b9bae02SChangbin Du
139*3b9bae02SChangbin Du	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_ALL_TYPES)
140*3b9bae02SChangbin Du	if (nvec < 0)
141*3b9bae02SChangbin Du		goto out_err;
142*3b9bae02SChangbin Du
143*3b9bae02SChangbin DuIf a driver is unable or unwilling to deal with a variable number of MSI
144*3b9bae02SChangbin Duinterrupts it can request a particular number of interrupts by passing that
145*3b9bae02SChangbin Dunumber to pci_alloc_irq_vectors() function as both 'min_vecs' and
146*3b9bae02SChangbin Du'max_vecs' parameters::
147*3b9bae02SChangbin Du
148*3b9bae02SChangbin Du	ret = pci_alloc_irq_vectors(pdev, nvec, nvec, PCI_IRQ_ALL_TYPES);
149*3b9bae02SChangbin Du	if (ret < 0)
150*3b9bae02SChangbin Du		goto out_err;
151*3b9bae02SChangbin Du
152*3b9bae02SChangbin DuThe most notorious example of the request type described above is enabling
153*3b9bae02SChangbin Duthe single MSI mode for a device.  It could be done by passing two 1s as
154*3b9bae02SChangbin Du'min_vecs' and 'max_vecs'::
155*3b9bae02SChangbin Du
156*3b9bae02SChangbin Du	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);
157*3b9bae02SChangbin Du	if (ret < 0)
158*3b9bae02SChangbin Du		goto out_err;
159*3b9bae02SChangbin Du
160*3b9bae02SChangbin DuSome devices might not support using legacy line interrupts, in which case
161*3b9bae02SChangbin Duthe driver can specify that only MSI or MSI-X is acceptable::
162*3b9bae02SChangbin Du
163*3b9bae02SChangbin Du	nvec = pci_alloc_irq_vectors(pdev, 1, nvec, PCI_IRQ_MSI | PCI_IRQ_MSIX);
164*3b9bae02SChangbin Du	if (nvec < 0)
165*3b9bae02SChangbin Du		goto out_err;
166*3b9bae02SChangbin Du
167*3b9bae02SChangbin DuLegacy APIs
168*3b9bae02SChangbin Du-----------
169*3b9bae02SChangbin Du
170*3b9bae02SChangbin DuThe following old APIs to enable and disable MSI or MSI-X interrupts should
171*3b9bae02SChangbin Dunot be used in new code::
172*3b9bae02SChangbin Du
173*3b9bae02SChangbin Du  pci_enable_msi()		/* deprecated */
174*3b9bae02SChangbin Du  pci_disable_msi()		/* deprecated */
175*3b9bae02SChangbin Du  pci_enable_msix_range()	/* deprecated */
176*3b9bae02SChangbin Du  pci_enable_msix_exact()	/* deprecated */
177*3b9bae02SChangbin Du  pci_disable_msix()		/* deprecated */
178*3b9bae02SChangbin Du
179*3b9bae02SChangbin DuAdditionally there are APIs to provide the number of supported MSI or MSI-X
180*3b9bae02SChangbin Duvectors: pci_msi_vec_count() and pci_msix_vec_count().  In general these
181*3b9bae02SChangbin Dushould be avoided in favor of letting pci_alloc_irq_vectors() cap the
182*3b9bae02SChangbin Dunumber of vectors.  If you have a legitimate special use case for the count
183*3b9bae02SChangbin Duof vectors we might have to revisit that decision and add a
184*3b9bae02SChangbin Dupci_nr_irq_vectors() helper that handles MSI and MSI-X transparently.
185*3b9bae02SChangbin Du
186*3b9bae02SChangbin DuConsiderations when using MSIs
187*3b9bae02SChangbin Du------------------------------
188*3b9bae02SChangbin Du
189*3b9bae02SChangbin DuSpinlocks
190*3b9bae02SChangbin Du~~~~~~~~~
191*3b9bae02SChangbin Du
192*3b9bae02SChangbin DuMost device drivers have a per-device spinlock which is taken in the
193*3b9bae02SChangbin Duinterrupt handler.  With pin-based interrupts or a single MSI, it is not
194*3b9bae02SChangbin Dunecessary to disable interrupts (Linux guarantees the same interrupt will
195*3b9bae02SChangbin Dunot be re-entered).  If a device uses multiple interrupts, the driver
196*3b9bae02SChangbin Dumust disable interrupts while the lock is held.  If the device sends
197*3b9bae02SChangbin Dua different interrupt, the driver will deadlock trying to recursively
198*3b9bae02SChangbin Duacquire the spinlock.  Such deadlocks can be avoided by using
199*3b9bae02SChangbin Duspin_lock_irqsave() or spin_lock_irq() which disable local interrupts
200*3b9bae02SChangbin Duand acquire the lock (see Documentation/kernel-hacking/locking.rst).
201*3b9bae02SChangbin Du
202*3b9bae02SChangbin DuHow to tell whether MSI/MSI-X is enabled on a device
203*3b9bae02SChangbin Du----------------------------------------------------
204*3b9bae02SChangbin Du
205*3b9bae02SChangbin DuUsing 'lspci -v' (as root) may show some devices with "MSI", "Message
206*3b9bae02SChangbin DuSignalled Interrupts" or "MSI-X" capabilities.  Each of these capabilities
207*3b9bae02SChangbin Duhas an 'Enable' flag which is followed with either "+" (enabled)
208*3b9bae02SChangbin Duor "-" (disabled).
209*3b9bae02SChangbin Du
210*3b9bae02SChangbin Du
211*3b9bae02SChangbin DuMSI quirks
212*3b9bae02SChangbin Du==========
213*3b9bae02SChangbin Du
214*3b9bae02SChangbin DuSeveral PCI chipsets or devices are known not to support MSIs.
215*3b9bae02SChangbin DuThe PCI stack provides three ways to disable MSIs:
216*3b9bae02SChangbin Du
217*3b9bae02SChangbin Du1. globally
218*3b9bae02SChangbin Du2. on all devices behind a specific bridge
219*3b9bae02SChangbin Du3. on a single device
220*3b9bae02SChangbin Du
221*3b9bae02SChangbin DuDisabling MSIs globally
222*3b9bae02SChangbin Du-----------------------
223*3b9bae02SChangbin Du
224*3b9bae02SChangbin DuSome host chipsets simply don't support MSIs properly.  If we're
225*3b9bae02SChangbin Dulucky, the manufacturer knows this and has indicated it in the ACPI
226*3b9bae02SChangbin DuFADT table.  In this case, Linux automatically disables MSIs.
227*3b9bae02SChangbin DuSome boards don't include this information in the table and so we have
228*3b9bae02SChangbin Duto detect them ourselves.  The complete list of these is found near the
229*3b9bae02SChangbin Duquirk_disable_all_msi() function in drivers/pci/quirks.c.
230*3b9bae02SChangbin Du
231*3b9bae02SChangbin DuIf you have a board which has problems with MSIs, you can pass pci=nomsi
232*3b9bae02SChangbin Duon the kernel command line to disable MSIs on all devices.  It would be
233*3b9bae02SChangbin Duin your best interests to report the problem to linux-pci@vger.kernel.org
234*3b9bae02SChangbin Duincluding a full 'lspci -v' so we can add the quirks to the kernel.
235*3b9bae02SChangbin Du
236*3b9bae02SChangbin DuDisabling MSIs below a bridge
237*3b9bae02SChangbin Du-----------------------------
238*3b9bae02SChangbin Du
239*3b9bae02SChangbin DuSome PCI bridges are not able to route MSIs between busses properly.
240*3b9bae02SChangbin DuIn this case, MSIs must be disabled on all devices behind the bridge.
241*3b9bae02SChangbin Du
242*3b9bae02SChangbin DuSome bridges allow you to enable MSIs by changing some bits in their
243*3b9bae02SChangbin DuPCI configuration space (especially the Hypertransport chipsets such
244*3b9bae02SChangbin Duas the nVidia nForce and Serverworks HT2000).  As with host chipsets,
245*3b9bae02SChangbin DuLinux mostly knows about them and automatically enables MSIs if it can.
246*3b9bae02SChangbin DuIf you have a bridge unknown to Linux, you can enable
247*3b9bae02SChangbin DuMSIs in configuration space using whatever method you know works, then
248*3b9bae02SChangbin Duenable MSIs on that bridge by doing::
249*3b9bae02SChangbin Du
250*3b9bae02SChangbin Du       echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
251*3b9bae02SChangbin Du
252*3b9bae02SChangbin Duwhere $bridge is the PCI address of the bridge you've enabled (eg
253*3b9bae02SChangbin Du0000:00:0e.0).
254*3b9bae02SChangbin Du
255*3b9bae02SChangbin DuTo disable MSIs, echo 0 instead of 1.  Changing this value should be
256*3b9bae02SChangbin Dudone with caution as it could break interrupt handling for all devices
257*3b9bae02SChangbin Dubelow this bridge.
258*3b9bae02SChangbin Du
259*3b9bae02SChangbin DuAgain, please notify linux-pci@vger.kernel.org of any bridges that need
260*3b9bae02SChangbin Duspecial handling.
261*3b9bae02SChangbin Du
262*3b9bae02SChangbin DuDisabling MSIs on a single device
263*3b9bae02SChangbin Du---------------------------------
264*3b9bae02SChangbin Du
265*3b9bae02SChangbin DuSome devices are known to have faulty MSI implementations.  Usually this
266*3b9bae02SChangbin Duis handled in the individual device driver, but occasionally it's necessary
267*3b9bae02SChangbin Duto handle this with a quirk.  Some drivers have an option to disable use
268*3b9bae02SChangbin Duof MSI.  While this is a convenient workaround for the driver author,
269*3b9bae02SChangbin Duit is not good practice, and should not be emulated.
270*3b9bae02SChangbin Du
271*3b9bae02SChangbin DuFinding why MSIs are disabled on a device
272*3b9bae02SChangbin Du-----------------------------------------
273*3b9bae02SChangbin Du
274*3b9bae02SChangbin DuFrom the above three sections, you can see that there are many reasons
275*3b9bae02SChangbin Duwhy MSIs may not be enabled for a given device.  Your first step should
276*3b9bae02SChangbin Dube to examine your dmesg carefully to determine whether MSIs are enabled
277*3b9bae02SChangbin Dufor your machine.  You should also check your .config to be sure you
278*3b9bae02SChangbin Duhave enabled CONFIG_PCI_MSI.
279*3b9bae02SChangbin Du
280*3b9bae02SChangbin DuThen, 'lspci -t' gives the list of bridges above a device. Reading
281*3b9bae02SChangbin Du`/sys/bus/pci/devices/*/msi_bus` will tell you whether MSIs are enabled (1)
282*3b9bae02SChangbin Duor disabled (0).  If 0 is found in any of the msi_bus files belonging
283*3b9bae02SChangbin Duto bridges between the PCI root and the device, MSIs are disabled.
284*3b9bae02SChangbin Du
285*3b9bae02SChangbin DuIt is also worth checking the device driver to see whether it supports MSIs.
286*3b9bae02SChangbin DuFor example, it may contain calls to pci_irq_alloc_vectors() with the
287*3b9bae02SChangbin DuPCI_IRQ_MSI or PCI_IRQ_MSIX flags.
288