xref: /openbmc/linux/Documentation/PCI/pci.rst (revision 4f2c0a4acffbec01079c28f839422e64ddeff004)
1229b4e07SChangbin Du.. SPDX-License-Identifier: GPL-2.0
2229b4e07SChangbin Du
3229b4e07SChangbin Du==============================
4229b4e07SChangbin DuHow To Write Linux PCI Drivers
5229b4e07SChangbin Du==============================
6229b4e07SChangbin Du
7229b4e07SChangbin Du:Authors: - Martin Mares <mj@ucw.cz>
8229b4e07SChangbin Du          - Grant Grundler <grundler@parisc-linux.org>
9229b4e07SChangbin Du
10229b4e07SChangbin DuThe world of PCI is vast and full of (mostly unpleasant) surprises.
11229b4e07SChangbin DuSince each CPU architecture implements different chip-sets and PCI devices
12229b4e07SChangbin Duhave different requirements (erm, "features"), the result is the PCI support
13229b4e07SChangbin Duin the Linux kernel is not as trivial as one would wish. This short paper
14229b4e07SChangbin Dutries to introduce all potential driver authors to Linux APIs for
15229b4e07SChangbin DuPCI device drivers.
16229b4e07SChangbin Du
17229b4e07SChangbin DuA more complete resource is the third edition of "Linux Device Drivers"
18229b4e07SChangbin Duby Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
19229b4e07SChangbin DuLDD3 is available for free (under Creative Commons License) from:
207ecd4a81SAlexander A. Klimovhttps://lwn.net/Kernel/LDD3/.
21229b4e07SChangbin Du
22229b4e07SChangbin DuHowever, keep in mind that all documents are subject to "bit rot".
23229b4e07SChangbin DuRefer to the source code if things are not working as described here.
24229b4e07SChangbin Du
25229b4e07SChangbin DuPlease send questions/comments/patches about Linux PCI API to the
26229b4e07SChangbin Du"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
27229b4e07SChangbin Du
28229b4e07SChangbin Du
29229b4e07SChangbin DuStructure of PCI drivers
30229b4e07SChangbin Du========================
31229b4e07SChangbin DuPCI drivers "discover" PCI devices in a system via pci_register_driver().
32229b4e07SChangbin DuActually, it's the other way around. When the PCI generic code discovers
33229b4e07SChangbin Dua new device, the driver with a matching "description" will be notified.
34229b4e07SChangbin DuDetails on this below.
35229b4e07SChangbin Du
36229b4e07SChangbin Dupci_register_driver() leaves most of the probing for devices to
37229b4e07SChangbin Duthe PCI layer and supports online insertion/removal of devices [thus
38229b4e07SChangbin Dusupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver].
39229b4e07SChangbin Dupci_register_driver() call requires passing in a table of function
40229b4e07SChangbin Dupointers and thus dictates the high level structure of a driver.
41229b4e07SChangbin Du
42229b4e07SChangbin DuOnce the driver knows about a PCI device and takes ownership, the
43229b4e07SChangbin Dudriver generally needs to perform the following initialization:
44229b4e07SChangbin Du
45229b4e07SChangbin Du  - Enable the device
46229b4e07SChangbin Du  - Request MMIO/IOP resources
47229b4e07SChangbin Du  - Set the DMA mask size (for both coherent and streaming DMA)
48229b4e07SChangbin Du  - Allocate and initialize shared control data (pci_allocate_coherent())
49229b4e07SChangbin Du  - Access device configuration space (if needed)
50229b4e07SChangbin Du  - Register IRQ handler (request_irq())
51229b4e07SChangbin Du  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
52229b4e07SChangbin Du  - Enable DMA/processing engines
53229b4e07SChangbin Du
54229b4e07SChangbin DuWhen done using the device, and perhaps the module needs to be unloaded,
55229b4e07SChangbin Duthe driver needs to take the follow steps:
56229b4e07SChangbin Du
57229b4e07SChangbin Du  - Disable the device from generating IRQs
58229b4e07SChangbin Du  - Release the IRQ (free_irq())
59229b4e07SChangbin Du  - Stop all DMA activity
60229b4e07SChangbin Du  - Release DMA buffers (both streaming and coherent)
61229b4e07SChangbin Du  - Unregister from other subsystems (e.g. scsi or netdev)
62229b4e07SChangbin Du  - Release MMIO/IOP resources
63229b4e07SChangbin Du  - Disable the device
64229b4e07SChangbin Du
65229b4e07SChangbin DuMost of these topics are covered in the following sections.
66229b4e07SChangbin DuFor the rest look at LDD3 or <linux/pci.h> .
67229b4e07SChangbin Du
68229b4e07SChangbin DuIf the PCI subsystem is not configured (CONFIG_PCI is not set), most of
69229b4e07SChangbin Duthe PCI functions described below are defined as inline functions either
70229b4e07SChangbin Ducompletely empty or just returning an appropriate error codes to avoid
71229b4e07SChangbin Dulots of ifdefs in the drivers.
72229b4e07SChangbin Du
73229b4e07SChangbin Du
74229b4e07SChangbin Dupci_register_driver() call
75229b4e07SChangbin Du==========================
76229b4e07SChangbin Du
77229b4e07SChangbin DuPCI device drivers call ``pci_register_driver()`` during their
78229b4e07SChangbin Duinitialization with a pointer to a structure describing the driver
79229b4e07SChangbin Du(``struct pci_driver``):
80229b4e07SChangbin Du
81229b4e07SChangbin Du.. kernel-doc:: include/linux/pci.h
82229b4e07SChangbin Du   :functions: pci_driver
83229b4e07SChangbin Du
84229b4e07SChangbin DuThe ID table is an array of ``struct pci_device_id`` entries ending with an
85229b4e07SChangbin Duall-zero entry.  Definitions with static const are generally preferred.
86229b4e07SChangbin Du
87229b4e07SChangbin Du.. kernel-doc:: include/linux/mod_devicetable.h
88229b4e07SChangbin Du   :functions: pci_device_id
89229b4e07SChangbin Du
90229b4e07SChangbin DuMost drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
91229b4e07SChangbin Dua pci_device_id table.
92229b4e07SChangbin Du
93229b4e07SChangbin DuNew PCI IDs may be added to a device driver pci_ids table at runtime
94229b4e07SChangbin Duas shown below::
95229b4e07SChangbin Du
96229b4e07SChangbin Du  echo "vendor device subvendor subdevice class class_mask driver_data" > \
97229b4e07SChangbin Du  /sys/bus/pci/drivers/{driver}/new_id
98229b4e07SChangbin Du
99229b4e07SChangbin DuAll fields are passed in as hexadecimal values (no leading 0x).
100229b4e07SChangbin DuThe vendor and device fields are mandatory, the others are optional. Users
101229b4e07SChangbin Duneed pass only as many optional fields as necessary:
102229b4e07SChangbin Du
103229b4e07SChangbin Du  - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
104229b4e07SChangbin Du  - class and classmask fields default to 0
105229b4e07SChangbin Du  - driver_data defaults to 0UL.
106343b7258SMax Gurtovoy  - override_only field defaults to 0.
107229b4e07SChangbin Du
108229b4e07SChangbin DuNote that driver_data must match the value used by any of the pci_device_id
109229b4e07SChangbin Duentries defined in the driver. This makes the driver_data field mandatory
110229b4e07SChangbin Duif all the pci_device_id entries have a non-zero driver_data value.
111229b4e07SChangbin Du
112229b4e07SChangbin DuOnce added, the driver probe routine will be invoked for any unclaimed
113229b4e07SChangbin DuPCI devices listed in its (newly updated) pci_ids list.
114229b4e07SChangbin Du
115229b4e07SChangbin DuWhen the driver exits, it just calls pci_unregister_driver() and the PCI layer
116229b4e07SChangbin Duautomatically calls the remove hook for all devices handled by the driver.
117229b4e07SChangbin Du
118229b4e07SChangbin Du
119229b4e07SChangbin Du"Attributes" for driver functions/data
120229b4e07SChangbin Du--------------------------------------
121229b4e07SChangbin Du
122229b4e07SChangbin DuPlease mark the initialization and cleanup functions where appropriate
123229b4e07SChangbin Du(the corresponding macros are defined in <linux/init.h>):
124229b4e07SChangbin Du
125229b4e07SChangbin Du	======		=================================================
126229b4e07SChangbin Du	__init		Initialization code. Thrown away after the driver
127229b4e07SChangbin Du			initializes.
128229b4e07SChangbin Du	__exit		Exit code. Ignored for non-modular drivers.
129229b4e07SChangbin Du	======		=================================================
130229b4e07SChangbin Du
131229b4e07SChangbin DuTips on when/where to use the above attributes:
132229b4e07SChangbin Du	- The module_init()/module_exit() functions (and all
133229b4e07SChangbin Du	  initialization functions called _only_ from these)
134229b4e07SChangbin Du	  should be marked __init/__exit.
135229b4e07SChangbin Du
136229b4e07SChangbin Du	- Do not mark the struct pci_driver.
137229b4e07SChangbin Du
138229b4e07SChangbin Du	- Do NOT mark a function if you are not sure which mark to use.
139229b4e07SChangbin Du	  Better to not mark the function than mark the function wrong.
140229b4e07SChangbin Du
141229b4e07SChangbin Du
142229b4e07SChangbin DuHow to find PCI devices manually
143229b4e07SChangbin Du================================
144229b4e07SChangbin Du
145229b4e07SChangbin DuPCI drivers should have a really good reason for not using the
146229b4e07SChangbin Dupci_register_driver() interface to search for PCI devices.
147229b4e07SChangbin DuThe main reason PCI devices are controlled by multiple drivers
148229b4e07SChangbin Duis because one PCI device implements several different HW services.
149229b4e07SChangbin DuE.g. combined serial/parallel port/floppy controller.
150229b4e07SChangbin Du
151229b4e07SChangbin DuA manual search may be performed using the following constructs:
152229b4e07SChangbin Du
153229b4e07SChangbin DuSearching by vendor and device ID::
154229b4e07SChangbin Du
155229b4e07SChangbin Du	struct pci_dev *dev = NULL;
156229b4e07SChangbin Du	while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
157229b4e07SChangbin Du		configure_device(dev);
158229b4e07SChangbin Du
159229b4e07SChangbin DuSearching by class ID (iterate in a similar way)::
160229b4e07SChangbin Du
161229b4e07SChangbin Du	pci_get_class(CLASS_ID, dev)
162229b4e07SChangbin Du
163229b4e07SChangbin DuSearching by both vendor/device and subsystem vendor/device ID::
164229b4e07SChangbin Du
165229b4e07SChangbin Du	pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
166229b4e07SChangbin Du
167229b4e07SChangbin DuYou can use the constant PCI_ANY_ID as a wildcard replacement for
168229b4e07SChangbin DuVENDOR_ID or DEVICE_ID.  This allows searching for any device from a
169229b4e07SChangbin Duspecific vendor, for example.
170229b4e07SChangbin Du
171229b4e07SChangbin DuThese functions are hotplug-safe. They increment the reference count on
172229b4e07SChangbin Duthe pci_dev that they return. You must eventually (possibly at module unload)
173229b4e07SChangbin Dudecrement the reference count on these devices by calling pci_dev_put().
174229b4e07SChangbin Du
175229b4e07SChangbin Du
176229b4e07SChangbin DuDevice Initialization Steps
177229b4e07SChangbin Du===========================
178229b4e07SChangbin Du
179229b4e07SChangbin DuAs noted in the introduction, most PCI drivers need the following steps
180229b4e07SChangbin Dufor device initialization:
181229b4e07SChangbin Du
182229b4e07SChangbin Du  - Enable the device
183229b4e07SChangbin Du  - Request MMIO/IOP resources
184229b4e07SChangbin Du  - Set the DMA mask size (for both coherent and streaming DMA)
185229b4e07SChangbin Du  - Allocate and initialize shared control data (pci_allocate_coherent())
186229b4e07SChangbin Du  - Access device configuration space (if needed)
187229b4e07SChangbin Du  - Register IRQ handler (request_irq())
188229b4e07SChangbin Du  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
189229b4e07SChangbin Du  - Enable DMA/processing engines.
190229b4e07SChangbin Du
191229b4e07SChangbin DuThe driver can access PCI config space registers at any time.
192229b4e07SChangbin Du(Well, almost. When running BIST, config space can go away...but
193229b4e07SChangbin Duthat will just result in a PCI Bus Master Abort and config reads
194229b4e07SChangbin Duwill return garbage).
195229b4e07SChangbin Du
196229b4e07SChangbin Du
197229b4e07SChangbin DuEnable the PCI device
198229b4e07SChangbin Du---------------------
199229b4e07SChangbin DuBefore touching any device registers, the driver needs to enable
200229b4e07SChangbin Duthe PCI device by calling pci_enable_device(). This will:
201229b4e07SChangbin Du
202229b4e07SChangbin Du  - wake up the device if it was in suspended state,
203229b4e07SChangbin Du  - allocate I/O and memory regions of the device (if BIOS did not),
204229b4e07SChangbin Du  - allocate an IRQ (if BIOS did not).
205229b4e07SChangbin Du
206229b4e07SChangbin Du.. note::
207229b4e07SChangbin Du   pci_enable_device() can fail! Check the return value.
208229b4e07SChangbin Du
209229b4e07SChangbin Du.. warning::
210229b4e07SChangbin Du   OS BUG: we don't check resource allocations before enabling those
211229b4e07SChangbin Du   resources. The sequence would make more sense if we called
212229b4e07SChangbin Du   pci_request_resources() before calling pci_enable_device().
213abccb9d9SRandy Dunlap   Currently, the device drivers can't detect the bug when two
214229b4e07SChangbin Du   devices have been allocated the same range. This is not a common
215229b4e07SChangbin Du   problem and unlikely to get fixed soon.
216229b4e07SChangbin Du
217229b4e07SChangbin Du   This has been discussed before but not changed as of 2.6.19:
21816bbbc87SBjorn Helgaas   https://lore.kernel.org/r/20060302180025.GC28895@flint.arm.linux.org.uk/
219229b4e07SChangbin Du
220229b4e07SChangbin Du
221229b4e07SChangbin Dupci_set_master() will enable DMA by setting the bus master bit
222229b4e07SChangbin Duin the PCI_COMMAND register. It also fixes the latency timer value if
223229b4e07SChangbin Duit's set to something bogus by the BIOS.  pci_clear_master() will
224229b4e07SChangbin Dudisable DMA by clearing the bus master bit.
225229b4e07SChangbin Du
226229b4e07SChangbin DuIf the PCI device can use the PCI Memory-Write-Invalidate transaction,
227229b4e07SChangbin Ducall pci_set_mwi().  This enables the PCI_COMMAND bit for Mem-Wr-Inval
228229b4e07SChangbin Duand also ensures that the cache line size register is set correctly.
229229b4e07SChangbin DuCheck the return value of pci_set_mwi() as not all architectures
230229b4e07SChangbin Duor chip-sets may support Memory-Write-Invalidate.  Alternatively,
231229b4e07SChangbin Duif Mem-Wr-Inval would be nice to have but is not required, call
232229b4e07SChangbin Dupci_try_set_mwi() to have the system do its best effort at enabling
233229b4e07SChangbin DuMem-Wr-Inval.
234229b4e07SChangbin Du
235229b4e07SChangbin Du
236229b4e07SChangbin DuRequest MMIO/IOP resources
237229b4e07SChangbin Du--------------------------
238229b4e07SChangbin DuMemory (MMIO), and I/O port addresses should NOT be read directly
239229b4e07SChangbin Dufrom the PCI device config space. Use the values in the pci_dev structure
240229b4e07SChangbin Duas the PCI "bus address" might have been remapped to a "host physical"
241229b4e07SChangbin Duaddress by the arch/chip-set specific kernel support.
242229b4e07SChangbin Du
2437d3d3254SMauro Carvalho ChehabSee Documentation/driver-api/io-mapping.rst for how to access device registers
244229b4e07SChangbin Duor device memory.
245229b4e07SChangbin Du
246229b4e07SChangbin DuThe device driver needs to call pci_request_region() to verify
247229b4e07SChangbin Duno other device is already using the same address resource.
248229b4e07SChangbin DuConversely, drivers should call pci_release_region() AFTER
249229b4e07SChangbin Ducalling pci_disable_device().
250229b4e07SChangbin DuThe idea is to prevent two devices colliding on the same address range.
251229b4e07SChangbin Du
252229b4e07SChangbin Du.. tip::
253229b4e07SChangbin Du   See OS BUG comment above. Currently (2.6.19), The driver can only
254229b4e07SChangbin Du   determine MMIO and IO Port resource availability _after_ calling
255229b4e07SChangbin Du   pci_enable_device().
256229b4e07SChangbin Du
257229b4e07SChangbin DuGeneric flavors of pci_request_region() are request_mem_region()
258229b4e07SChangbin Du(for MMIO ranges) and request_region() (for IO Port ranges).
259229b4e07SChangbin DuUse these for address resources that are not described by "normal" PCI
260229b4e07SChangbin DuBARs.
261229b4e07SChangbin Du
262229b4e07SChangbin DuAlso see pci_request_selected_regions() below.
263229b4e07SChangbin Du
264229b4e07SChangbin Du
265229b4e07SChangbin DuSet the DMA mask size
266229b4e07SChangbin Du---------------------
267229b4e07SChangbin Du.. note::
268229b4e07SChangbin Du   If anything below doesn't make sense, please refer to
269bffbae6dSMauro Carvalho Chehab   Documentation/core-api/dma-api.rst. This section is just a reminder that
270229b4e07SChangbin Du   drivers need to indicate DMA capabilities of the device and is not
271229b4e07SChangbin Du   an authoritative source for DMA interfaces.
272229b4e07SChangbin Du
273229b4e07SChangbin DuWhile all drivers should explicitly indicate the DMA capability
274229b4e07SChangbin Du(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
275229b4e07SChangbin Du32-bit bus master capability for streaming data need the driver
276*f21949c1SAlex Williamsonto "register" this capability by calling dma_set_mask() with
277229b4e07SChangbin Duappropriate parameters.  In general this allows more efficient DMA
278229b4e07SChangbin Duon systems where System RAM exists above 4G _physical_ address.
279229b4e07SChangbin Du
280229b4e07SChangbin DuDrivers for all PCI-X and PCIe compliant devices must call
281*f21949c1SAlex Williamsondma_set_mask() as they are 64-bit DMA devices.
282229b4e07SChangbin Du
283229b4e07SChangbin DuSimilarly, drivers must also "register" this capability if the device
28405b0ebd0SChristoph Hellwigcan directly address "coherent memory" in System RAM above 4G physical
28505b0ebd0SChristoph Hellwigaddress by calling dma_set_coherent_mask().
286229b4e07SChangbin DuAgain, this includes drivers for all PCI-X and PCIe compliant devices.
287229b4e07SChangbin DuMany 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
288229b4e07SChangbin Du64-bit DMA capable for payload ("streaming") data but not control
28905b0ebd0SChristoph Hellwig("coherent") data.
290229b4e07SChangbin Du
291229b4e07SChangbin Du
292229b4e07SChangbin DuSetup shared control data
293229b4e07SChangbin Du-------------------------
29405b0ebd0SChristoph HellwigOnce the DMA masks are set, the driver can allocate "coherent" (a.k.a. shared)
295bffbae6dSMauro Carvalho Chehabmemory.  See Documentation/core-api/dma-api.rst for a full description of
296229b4e07SChangbin Duthe DMA APIs. This section is just a reminder that it needs to be done
297229b4e07SChangbin Dubefore enabling DMA on the device.
298229b4e07SChangbin Du
299229b4e07SChangbin Du
300229b4e07SChangbin DuInitialize device registers
301229b4e07SChangbin Du---------------------------
302229b4e07SChangbin DuSome drivers will need specific "capability" fields programmed
303229b4e07SChangbin Duor other "vendor specific" register initialized or reset.
304229b4e07SChangbin DuE.g. clearing pending interrupts.
305229b4e07SChangbin Du
306229b4e07SChangbin Du
307229b4e07SChangbin DuRegister IRQ handler
308229b4e07SChangbin Du--------------------
309229b4e07SChangbin DuWhile calling request_irq() is the last step described here,
310229b4e07SChangbin Duthis is often just another intermediate step to initialize a device.
311229b4e07SChangbin DuThis step can often be deferred until the device is opened for use.
312229b4e07SChangbin Du
313229b4e07SChangbin DuAll interrupt handlers for IRQ lines should be registered with IRQF_SHARED
314229b4e07SChangbin Duand use the devid to map IRQs to devices (remember that all PCI IRQ lines
315229b4e07SChangbin Ducan be shared).
316229b4e07SChangbin Du
317229b4e07SChangbin Durequest_irq() will associate an interrupt handler and device handle
318229b4e07SChangbin Duwith an interrupt number. Historically interrupt numbers represent
319229b4e07SChangbin DuIRQ lines which run from the PCI device to the Interrupt controller.
320229b4e07SChangbin DuWith MSI and MSI-X (more below) the interrupt number is a CPU "vector".
321229b4e07SChangbin Du
322229b4e07SChangbin Durequest_irq() also enables the interrupt. Make sure the device is
323229b4e07SChangbin Duquiesced and does not have any interrupts pending before registering
324229b4e07SChangbin Duthe interrupt handler.
325229b4e07SChangbin Du
326229b4e07SChangbin DuMSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts"
327229b4e07SChangbin Duwhich deliver interrupts to the CPU via a DMA write to a Local APIC.
328229b4e07SChangbin DuThe fundamental difference between MSI and MSI-X is how multiple
329229b4e07SChangbin Du"vectors" get allocated. MSI requires contiguous blocks of vectors
330229b4e07SChangbin Duwhile MSI-X can allocate several individual ones.
331229b4e07SChangbin Du
332229b4e07SChangbin DuMSI capability can be enabled by calling pci_alloc_irq_vectors() with the
333229b4e07SChangbin DuPCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This
334229b4e07SChangbin Ducauses the PCI support to program CPU vector data into the PCI device
335229b4e07SChangbin Ducapability registers. Many architectures, chip-sets, or BIOSes do NOT
336229b4e07SChangbin Dusupport MSI or MSI-X and a call to pci_alloc_irq_vectors with just
337229b4e07SChangbin Duthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always
338229b4e07SChangbin Duspecify PCI_IRQ_LEGACY as well.
339229b4e07SChangbin Du
340229b4e07SChangbin DuDrivers that have different interrupt handlers for MSI/MSI-X and
341229b4e07SChangbin Dulegacy INTx should chose the right one based on the msi_enabled
342229b4e07SChangbin Duand msix_enabled flags in the pci_dev structure after calling
343229b4e07SChangbin Dupci_alloc_irq_vectors.
344229b4e07SChangbin Du
345229b4e07SChangbin DuThere are (at least) two really good reasons for using MSI:
346229b4e07SChangbin Du
347229b4e07SChangbin Du1) MSI is an exclusive interrupt vector by definition.
348229b4e07SChangbin Du   This means the interrupt handler doesn't have to verify
349229b4e07SChangbin Du   its device caused the interrupt.
350229b4e07SChangbin Du
351229b4e07SChangbin Du2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed
352229b4e07SChangbin Du   to be visible to the host CPU(s) when the MSI is delivered. This
353229b4e07SChangbin Du   is important for both data coherency and avoiding stale control data.
354229b4e07SChangbin Du   This guarantee allows the driver to omit MMIO reads to flush
355229b4e07SChangbin Du   the DMA stream.
356229b4e07SChangbin Du
357229b4e07SChangbin DuSee drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
358229b4e07SChangbin Duof MSI/MSI-X usage.
359229b4e07SChangbin Du
360229b4e07SChangbin Du
361229b4e07SChangbin DuPCI device shutdown
362229b4e07SChangbin Du===================
363229b4e07SChangbin Du
364229b4e07SChangbin DuWhen a PCI device driver is being unloaded, most of the following
365229b4e07SChangbin Dusteps need to be performed:
366229b4e07SChangbin Du
367229b4e07SChangbin Du  - Disable the device from generating IRQs
368229b4e07SChangbin Du  - Release the IRQ (free_irq())
369229b4e07SChangbin Du  - Stop all DMA activity
37005b0ebd0SChristoph Hellwig  - Release DMA buffers (both streaming and coherent)
371229b4e07SChangbin Du  - Unregister from other subsystems (e.g. scsi or netdev)
372229b4e07SChangbin Du  - Disable device from responding to MMIO/IO Port addresses
373229b4e07SChangbin Du  - Release MMIO/IO Port resource(s)
374229b4e07SChangbin Du
375229b4e07SChangbin Du
376229b4e07SChangbin DuStop IRQs on the device
377229b4e07SChangbin Du-----------------------
378229b4e07SChangbin DuHow to do this is chip/device specific. If it's not done, it opens
379229b4e07SChangbin Duthe possibility of a "screaming interrupt" if (and only if)
380229b4e07SChangbin Duthe IRQ is shared with another device.
381229b4e07SChangbin Du
382229b4e07SChangbin DuWhen the shared IRQ handler is "unhooked", the remaining devices
383229b4e07SChangbin Duusing the same IRQ line will still need the IRQ enabled. Thus if the
384229b4e07SChangbin Du"unhooked" device asserts IRQ line, the system will respond assuming
385229b4e07SChangbin Duit was one of the remaining devices asserted the IRQ line. Since none
386229b4e07SChangbin Duof the other devices will handle the IRQ, the system will "hang" until
387229b4e07SChangbin Duit decides the IRQ isn't going to get handled and masks the IRQ (100,000
388229b4e07SChangbin Duiterations later). Once the shared IRQ is masked, the remaining devices
389229b4e07SChangbin Duwill stop functioning properly. Not a nice situation.
390229b4e07SChangbin Du
391229b4e07SChangbin DuThis is another reason to use MSI or MSI-X if it's available.
392229b4e07SChangbin DuMSI and MSI-X are defined to be exclusive interrupts and thus
393229b4e07SChangbin Duare not susceptible to the "screaming interrupt" problem.
394229b4e07SChangbin Du
395229b4e07SChangbin Du
396229b4e07SChangbin DuRelease the IRQ
397229b4e07SChangbin Du---------------
398229b4e07SChangbin DuOnce the device is quiesced (no more IRQs), one can call free_irq().
399229b4e07SChangbin DuThis function will return control once any pending IRQs are handled,
400229b4e07SChangbin Du"unhook" the drivers IRQ handler from that IRQ, and finally release
401229b4e07SChangbin Duthe IRQ if no one else is using it.
402229b4e07SChangbin Du
403229b4e07SChangbin Du
404229b4e07SChangbin DuStop all DMA activity
405229b4e07SChangbin Du---------------------
406229b4e07SChangbin DuIt's extremely important to stop all DMA operations BEFORE attempting
407229b4e07SChangbin Duto deallocate DMA control data. Failure to do so can result in memory
408229b4e07SChangbin Ducorruption, hangs, and on some chip-sets a hard crash.
409229b4e07SChangbin Du
410229b4e07SChangbin DuStopping DMA after stopping the IRQs can avoid races where the
411229b4e07SChangbin DuIRQ handler might restart DMA engines.
412229b4e07SChangbin Du
413229b4e07SChangbin DuWhile this step sounds obvious and trivial, several "mature" drivers
414229b4e07SChangbin Dudidn't get this step right in the past.
415229b4e07SChangbin Du
416229b4e07SChangbin Du
417229b4e07SChangbin DuRelease DMA buffers
418229b4e07SChangbin Du-------------------
419229b4e07SChangbin DuOnce DMA is stopped, clean up streaming DMA first.
420229b4e07SChangbin DuI.e. unmap data buffers and return buffers to "upstream"
421229b4e07SChangbin Duowners if there is one.
422229b4e07SChangbin Du
42305b0ebd0SChristoph HellwigThen clean up "coherent" buffers which contain the control data.
424229b4e07SChangbin Du
425bffbae6dSMauro Carvalho ChehabSee Documentation/core-api/dma-api.rst for details on unmapping interfaces.
426229b4e07SChangbin Du
427229b4e07SChangbin Du
428229b4e07SChangbin DuUnregister from other subsystems
429229b4e07SChangbin Du--------------------------------
430229b4e07SChangbin DuMost low level PCI device drivers support some other subsystem
431229b4e07SChangbin Dulike USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
432229b4e07SChangbin Dudriver isn't losing resources from that other subsystem.
433229b4e07SChangbin DuIf this happens, typically the symptom is an Oops (panic) when
434229b4e07SChangbin Duthe subsystem attempts to call into a driver that has been unloaded.
435229b4e07SChangbin Du
436229b4e07SChangbin Du
437229b4e07SChangbin DuDisable Device from responding to MMIO/IO Port addresses
438229b4e07SChangbin Du--------------------------------------------------------
439229b4e07SChangbin Duio_unmap() MMIO or IO Port resources and then call pci_disable_device().
440229b4e07SChangbin DuThis is the symmetric opposite of pci_enable_device().
441229b4e07SChangbin DuDo not access device registers after calling pci_disable_device().
442229b4e07SChangbin Du
443229b4e07SChangbin Du
444229b4e07SChangbin DuRelease MMIO/IO Port Resource(s)
445229b4e07SChangbin Du--------------------------------
446229b4e07SChangbin DuCall pci_release_region() to mark the MMIO or IO Port range as available.
447229b4e07SChangbin DuFailure to do so usually results in the inability to reload the driver.
448229b4e07SChangbin Du
449229b4e07SChangbin Du
450229b4e07SChangbin DuHow to access PCI config space
451229b4e07SChangbin Du==============================
452229b4e07SChangbin Du
453229b4e07SChangbin DuYou can use `pci_(read|write)_config_(byte|word|dword)` to access the config
454229b4e07SChangbin Duspace of a device represented by `struct pci_dev *`. All these functions return
455229b4e07SChangbin Du0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
456229b4e07SChangbin Dutext string by pcibios_strerror. Most drivers expect that accesses to valid PCI
457229b4e07SChangbin Dudevices don't fail.
458229b4e07SChangbin Du
459229b4e07SChangbin DuIf you don't have a struct pci_dev available, you can call
460229b4e07SChangbin Du`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
461229b4e07SChangbin Duand function on that bus.
462229b4e07SChangbin Du
463229b4e07SChangbin DuIf you access fields in the standard portion of the config header, please
464229b4e07SChangbin Duuse symbolic names of locations and bits declared in <linux/pci.h>.
465229b4e07SChangbin Du
466229b4e07SChangbin DuIf you need to access Extended PCI Capability registers, just call
467229b4e07SChangbin Dupci_find_capability() for the particular capability and it will find the
468229b4e07SChangbin Ducorresponding register block for you.
469229b4e07SChangbin Du
470229b4e07SChangbin Du
471229b4e07SChangbin DuOther interesting functions
472229b4e07SChangbin Du===========================
473229b4e07SChangbin Du
474229b4e07SChangbin Du=============================	================================================
475229b4e07SChangbin Dupci_get_domain_bus_and_slot()	Find pci_dev corresponding to given domain,
476229b4e07SChangbin Du				bus and slot and number. If the device is
477229b4e07SChangbin Du				found, its reference count is increased.
478229b4e07SChangbin Dupci_set_power_state()		Set PCI Power Management state (0=D0 ... 3=D3)
479229b4e07SChangbin Dupci_find_capability()		Find specified capability in device's capability
480229b4e07SChangbin Du				list.
481229b4e07SChangbin Dupci_resource_start()		Returns bus start address for a given PCI region
482229b4e07SChangbin Dupci_resource_end()		Returns bus end address for a given PCI region
483229b4e07SChangbin Dupci_resource_len()		Returns the byte length of a PCI region
484229b4e07SChangbin Dupci_set_drvdata()		Set private driver data pointer for a pci_dev
485229b4e07SChangbin Dupci_get_drvdata()		Return private driver data pointer for a pci_dev
486229b4e07SChangbin Dupci_set_mwi()			Enable Memory-Write-Invalidate transactions.
487229b4e07SChangbin Dupci_clear_mwi()			Disable Memory-Write-Invalidate transactions.
488229b4e07SChangbin Du=============================	================================================
489229b4e07SChangbin Du
490229b4e07SChangbin Du
491229b4e07SChangbin DuMiscellaneous hints
492229b4e07SChangbin Du===================
493229b4e07SChangbin Du
494229b4e07SChangbin DuWhen displaying PCI device names to the user (for example when a driver wants
495229b4e07SChangbin Duto tell the user what card has it found), please use pci_name(pci_dev).
496229b4e07SChangbin Du
497229b4e07SChangbin DuAlways refer to the PCI devices by a pointer to the pci_dev structure.
498229b4e07SChangbin DuAll PCI layer functions use this identification and it's the only
499229b4e07SChangbin Dureasonable one. Don't use bus/slot/function numbers except for very
500229b4e07SChangbin Duspecial purposes -- on systems with multiple primary buses their semantics
501229b4e07SChangbin Ducan be pretty complex.
502229b4e07SChangbin Du
503229b4e07SChangbin DuDon't try to turn on Fast Back to Back writes in your driver.  All devices
504229b4e07SChangbin Duon the bus need to be capable of doing it, so this is something which needs
505229b4e07SChangbin Duto be handled by platform and generic code, not individual drivers.
506229b4e07SChangbin Du
507229b4e07SChangbin Du
508229b4e07SChangbin DuVendor and device identifications
509229b4e07SChangbin Du=================================
510229b4e07SChangbin Du
511229b4e07SChangbin DuDo not add new device or vendor IDs to include/linux/pci_ids.h unless they
512229b4e07SChangbin Duare shared across multiple drivers.  You can add private definitions in
513229b4e07SChangbin Duyour driver if they're helpful, or just use plain hex constants.
514229b4e07SChangbin Du
515229b4e07SChangbin DuThe device IDs are arbitrary hex numbers (vendor controlled) and normally used
516229b4e07SChangbin Duonly in a single location, the pci_device_id table.
517229b4e07SChangbin Du
5187ecd4a81SAlexander A. KlimovPlease DO submit new vendor/device IDs to https://pci-ids.ucw.cz/.
5197ecd4a81SAlexander A. KlimovThere's a mirror of the pci.ids file at https://github.com/pciutils/pciids.
520229b4e07SChangbin Du
521229b4e07SChangbin Du
522229b4e07SChangbin DuObsolete functions
523229b4e07SChangbin Du==================
524229b4e07SChangbin Du
525229b4e07SChangbin DuThere are several functions which you might come across when trying to
526229b4e07SChangbin Duport an old driver to the new PCI interface.  They are no longer present
527229b4e07SChangbin Duin the kernel as they aren't compatible with hotplug or PCI domains or
528229b4e07SChangbin Duhaving sane locking.
529229b4e07SChangbin Du
530229b4e07SChangbin Du=================	===========================================
531229b4e07SChangbin Dupci_find_device()	Superseded by pci_get_device()
532229b4e07SChangbin Dupci_find_subsys()	Superseded by pci_get_subsys()
533229b4e07SChangbin Dupci_find_slot()		Superseded by pci_get_domain_bus_and_slot()
534229b4e07SChangbin Dupci_get_slot()		Superseded by pci_get_domain_bus_and_slot()
535229b4e07SChangbin Du=================	===========================================
536229b4e07SChangbin Du
537229b4e07SChangbin DuThe alternative is the traditional PCI device driver that walks PCI
538229b4e07SChangbin Dudevice lists. This is still possible but discouraged.
539229b4e07SChangbin Du
540229b4e07SChangbin Du
541229b4e07SChangbin DuMMIO Space and "Write Posting"
542229b4e07SChangbin Du==============================
543229b4e07SChangbin Du
544229b4e07SChangbin DuConverting a driver from using I/O Port space to using MMIO space
545229b4e07SChangbin Duoften requires some additional changes. Specifically, "write posting"
546229b4e07SChangbin Duneeds to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2)
547229b4e07SChangbin Dualready do this. I/O Port space guarantees write transactions reach the PCI
548229b4e07SChangbin Dudevice before the CPU can continue. Writes to MMIO space allow the CPU
549229b4e07SChangbin Duto continue before the transaction reaches the PCI device. HW weenies
550229b4e07SChangbin Ducall this "Write Posting" because the write completion is "posted" to
551229b4e07SChangbin Duthe CPU before the transaction has reached its destination.
552229b4e07SChangbin Du
553229b4e07SChangbin DuThus, timing sensitive code should add readl() where the CPU is
554229b4e07SChangbin Duexpected to wait before doing other work.  The classic "bit banging"
555229b4e07SChangbin Dusequence works fine for I/O Port space::
556229b4e07SChangbin Du
557229b4e07SChangbin Du       for (i = 8; --i; val >>= 1) {
558229b4e07SChangbin Du               outb(val & 1, ioport_reg);      /* write bit */
559229b4e07SChangbin Du               udelay(10);
560229b4e07SChangbin Du       }
561229b4e07SChangbin Du
562229b4e07SChangbin DuThe same sequence for MMIO space should be::
563229b4e07SChangbin Du
564229b4e07SChangbin Du       for (i = 8; --i; val >>= 1) {
565229b4e07SChangbin Du               writeb(val & 1, mmio_reg);      /* write bit */
566229b4e07SChangbin Du               readb(safe_mmio_reg);           /* flush posted write */
567229b4e07SChangbin Du               udelay(10);
568229b4e07SChangbin Du       }
569229b4e07SChangbin Du
570229b4e07SChangbin DuIt is important that "safe_mmio_reg" not have any side effects that
571229b4e07SChangbin Duinterferes with the correct operation of the device.
572229b4e07SChangbin Du
573229b4e07SChangbin DuAnother case to watch out for is when resetting a PCI device. Use PCI
574229b4e07SChangbin DuConfiguration space reads to flush the writel(). This will gracefully
575229b4e07SChangbin Duhandle the PCI master abort on all platforms if the PCI device is
576229b4e07SChangbin Duexpected to not respond to a readl().  Most x86 platforms will allow
577229b4e07SChangbin DuMMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
578229b4e07SChangbin Du(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
579