xref: /openbmc/linux/Documentation/PCI/pci.rst (revision 7ecd4a8175104b55de120dd8847e0bfabf7d75aa)
1229b4e07SChangbin Du.. SPDX-License-Identifier: GPL-2.0
2229b4e07SChangbin Du
3229b4e07SChangbin Du==============================
4229b4e07SChangbin DuHow To Write Linux PCI Drivers
5229b4e07SChangbin Du==============================
6229b4e07SChangbin Du
7229b4e07SChangbin Du:Authors: - Martin Mares <mj@ucw.cz>
8229b4e07SChangbin Du          - Grant Grundler <grundler@parisc-linux.org>
9229b4e07SChangbin Du
10229b4e07SChangbin DuThe world of PCI is vast and full of (mostly unpleasant) surprises.
11229b4e07SChangbin DuSince each CPU architecture implements different chip-sets and PCI devices
12229b4e07SChangbin Duhave different requirements (erm, "features"), the result is the PCI support
13229b4e07SChangbin Duin the Linux kernel is not as trivial as one would wish. This short paper
14229b4e07SChangbin Dutries to introduce all potential driver authors to Linux APIs for
15229b4e07SChangbin DuPCI device drivers.
16229b4e07SChangbin Du
17229b4e07SChangbin DuA more complete resource is the third edition of "Linux Device Drivers"
18229b4e07SChangbin Duby Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
19229b4e07SChangbin DuLDD3 is available for free (under Creative Commons License) from:
20*7ecd4a81SAlexander A. Klimovhttps://lwn.net/Kernel/LDD3/.
21229b4e07SChangbin Du
22229b4e07SChangbin DuHowever, keep in mind that all documents are subject to "bit rot".
23229b4e07SChangbin DuRefer to the source code if things are not working as described here.
24229b4e07SChangbin Du
25229b4e07SChangbin DuPlease send questions/comments/patches about Linux PCI API to the
26229b4e07SChangbin Du"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
27229b4e07SChangbin Du
28229b4e07SChangbin Du
29229b4e07SChangbin DuStructure of PCI drivers
30229b4e07SChangbin Du========================
31229b4e07SChangbin DuPCI drivers "discover" PCI devices in a system via pci_register_driver().
32229b4e07SChangbin DuActually, it's the other way around. When the PCI generic code discovers
33229b4e07SChangbin Dua new device, the driver with a matching "description" will be notified.
34229b4e07SChangbin DuDetails on this below.
35229b4e07SChangbin Du
36229b4e07SChangbin Dupci_register_driver() leaves most of the probing for devices to
37229b4e07SChangbin Duthe PCI layer and supports online insertion/removal of devices [thus
38229b4e07SChangbin Dusupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver].
39229b4e07SChangbin Dupci_register_driver() call requires passing in a table of function
40229b4e07SChangbin Dupointers and thus dictates the high level structure of a driver.
41229b4e07SChangbin Du
42229b4e07SChangbin DuOnce the driver knows about a PCI device and takes ownership, the
43229b4e07SChangbin Dudriver generally needs to perform the following initialization:
44229b4e07SChangbin Du
45229b4e07SChangbin Du  - Enable the device
46229b4e07SChangbin Du  - Request MMIO/IOP resources
47229b4e07SChangbin Du  - Set the DMA mask size (for both coherent and streaming DMA)
48229b4e07SChangbin Du  - Allocate and initialize shared control data (pci_allocate_coherent())
49229b4e07SChangbin Du  - Access device configuration space (if needed)
50229b4e07SChangbin Du  - Register IRQ handler (request_irq())
51229b4e07SChangbin Du  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
52229b4e07SChangbin Du  - Enable DMA/processing engines
53229b4e07SChangbin Du
54229b4e07SChangbin DuWhen done using the device, and perhaps the module needs to be unloaded,
55229b4e07SChangbin Duthe driver needs to take the follow steps:
56229b4e07SChangbin Du
57229b4e07SChangbin Du  - Disable the device from generating IRQs
58229b4e07SChangbin Du  - Release the IRQ (free_irq())
59229b4e07SChangbin Du  - Stop all DMA activity
60229b4e07SChangbin Du  - Release DMA buffers (both streaming and coherent)
61229b4e07SChangbin Du  - Unregister from other subsystems (e.g. scsi or netdev)
62229b4e07SChangbin Du  - Release MMIO/IOP resources
63229b4e07SChangbin Du  - Disable the device
64229b4e07SChangbin Du
65229b4e07SChangbin DuMost of these topics are covered in the following sections.
66229b4e07SChangbin DuFor the rest look at LDD3 or <linux/pci.h> .
67229b4e07SChangbin Du
68229b4e07SChangbin DuIf the PCI subsystem is not configured (CONFIG_PCI is not set), most of
69229b4e07SChangbin Duthe PCI functions described below are defined as inline functions either
70229b4e07SChangbin Ducompletely empty or just returning an appropriate error codes to avoid
71229b4e07SChangbin Dulots of ifdefs in the drivers.
72229b4e07SChangbin Du
73229b4e07SChangbin Du
74229b4e07SChangbin Dupci_register_driver() call
75229b4e07SChangbin Du==========================
76229b4e07SChangbin Du
77229b4e07SChangbin DuPCI device drivers call ``pci_register_driver()`` during their
78229b4e07SChangbin Duinitialization with a pointer to a structure describing the driver
79229b4e07SChangbin Du(``struct pci_driver``):
80229b4e07SChangbin Du
81229b4e07SChangbin Du.. kernel-doc:: include/linux/pci.h
82229b4e07SChangbin Du   :functions: pci_driver
83229b4e07SChangbin Du
84229b4e07SChangbin DuThe ID table is an array of ``struct pci_device_id`` entries ending with an
85229b4e07SChangbin Duall-zero entry.  Definitions with static const are generally preferred.
86229b4e07SChangbin Du
87229b4e07SChangbin Du.. kernel-doc:: include/linux/mod_devicetable.h
88229b4e07SChangbin Du   :functions: pci_device_id
89229b4e07SChangbin Du
90229b4e07SChangbin DuMost drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
91229b4e07SChangbin Dua pci_device_id table.
92229b4e07SChangbin Du
93229b4e07SChangbin DuNew PCI IDs may be added to a device driver pci_ids table at runtime
94229b4e07SChangbin Duas shown below::
95229b4e07SChangbin Du
96229b4e07SChangbin Du  echo "vendor device subvendor subdevice class class_mask driver_data" > \
97229b4e07SChangbin Du  /sys/bus/pci/drivers/{driver}/new_id
98229b4e07SChangbin Du
99229b4e07SChangbin DuAll fields are passed in as hexadecimal values (no leading 0x).
100229b4e07SChangbin DuThe vendor and device fields are mandatory, the others are optional. Users
101229b4e07SChangbin Duneed pass only as many optional fields as necessary:
102229b4e07SChangbin Du
103229b4e07SChangbin Du  - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
104229b4e07SChangbin Du  - class and classmask fields default to 0
105229b4e07SChangbin Du  - driver_data defaults to 0UL.
106229b4e07SChangbin Du
107229b4e07SChangbin DuNote that driver_data must match the value used by any of the pci_device_id
108229b4e07SChangbin Duentries defined in the driver. This makes the driver_data field mandatory
109229b4e07SChangbin Duif all the pci_device_id entries have a non-zero driver_data value.
110229b4e07SChangbin Du
111229b4e07SChangbin DuOnce added, the driver probe routine will be invoked for any unclaimed
112229b4e07SChangbin DuPCI devices listed in its (newly updated) pci_ids list.
113229b4e07SChangbin Du
114229b4e07SChangbin DuWhen the driver exits, it just calls pci_unregister_driver() and the PCI layer
115229b4e07SChangbin Duautomatically calls the remove hook for all devices handled by the driver.
116229b4e07SChangbin Du
117229b4e07SChangbin Du
118229b4e07SChangbin Du"Attributes" for driver functions/data
119229b4e07SChangbin Du--------------------------------------
120229b4e07SChangbin Du
121229b4e07SChangbin DuPlease mark the initialization and cleanup functions where appropriate
122229b4e07SChangbin Du(the corresponding macros are defined in <linux/init.h>):
123229b4e07SChangbin Du
124229b4e07SChangbin Du	======		=================================================
125229b4e07SChangbin Du	__init		Initialization code. Thrown away after the driver
126229b4e07SChangbin Du			initializes.
127229b4e07SChangbin Du	__exit		Exit code. Ignored for non-modular drivers.
128229b4e07SChangbin Du	======		=================================================
129229b4e07SChangbin Du
130229b4e07SChangbin DuTips on when/where to use the above attributes:
131229b4e07SChangbin Du	- The module_init()/module_exit() functions (and all
132229b4e07SChangbin Du	  initialization functions called _only_ from these)
133229b4e07SChangbin Du	  should be marked __init/__exit.
134229b4e07SChangbin Du
135229b4e07SChangbin Du	- Do not mark the struct pci_driver.
136229b4e07SChangbin Du
137229b4e07SChangbin Du	- Do NOT mark a function if you are not sure which mark to use.
138229b4e07SChangbin Du	  Better to not mark the function than mark the function wrong.
139229b4e07SChangbin Du
140229b4e07SChangbin Du
141229b4e07SChangbin DuHow to find PCI devices manually
142229b4e07SChangbin Du================================
143229b4e07SChangbin Du
144229b4e07SChangbin DuPCI drivers should have a really good reason for not using the
145229b4e07SChangbin Dupci_register_driver() interface to search for PCI devices.
146229b4e07SChangbin DuThe main reason PCI devices are controlled by multiple drivers
147229b4e07SChangbin Duis because one PCI device implements several different HW services.
148229b4e07SChangbin DuE.g. combined serial/parallel port/floppy controller.
149229b4e07SChangbin Du
150229b4e07SChangbin DuA manual search may be performed using the following constructs:
151229b4e07SChangbin Du
152229b4e07SChangbin DuSearching by vendor and device ID::
153229b4e07SChangbin Du
154229b4e07SChangbin Du	struct pci_dev *dev = NULL;
155229b4e07SChangbin Du	while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
156229b4e07SChangbin Du		configure_device(dev);
157229b4e07SChangbin Du
158229b4e07SChangbin DuSearching by class ID (iterate in a similar way)::
159229b4e07SChangbin Du
160229b4e07SChangbin Du	pci_get_class(CLASS_ID, dev)
161229b4e07SChangbin Du
162229b4e07SChangbin DuSearching by both vendor/device and subsystem vendor/device ID::
163229b4e07SChangbin Du
164229b4e07SChangbin Du	pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
165229b4e07SChangbin Du
166229b4e07SChangbin DuYou can use the constant PCI_ANY_ID as a wildcard replacement for
167229b4e07SChangbin DuVENDOR_ID or DEVICE_ID.  This allows searching for any device from a
168229b4e07SChangbin Duspecific vendor, for example.
169229b4e07SChangbin Du
170229b4e07SChangbin DuThese functions are hotplug-safe. They increment the reference count on
171229b4e07SChangbin Duthe pci_dev that they return. You must eventually (possibly at module unload)
172229b4e07SChangbin Dudecrement the reference count on these devices by calling pci_dev_put().
173229b4e07SChangbin Du
174229b4e07SChangbin Du
175229b4e07SChangbin DuDevice Initialization Steps
176229b4e07SChangbin Du===========================
177229b4e07SChangbin Du
178229b4e07SChangbin DuAs noted in the introduction, most PCI drivers need the following steps
179229b4e07SChangbin Dufor device initialization:
180229b4e07SChangbin Du
181229b4e07SChangbin Du  - Enable the device
182229b4e07SChangbin Du  - Request MMIO/IOP resources
183229b4e07SChangbin Du  - Set the DMA mask size (for both coherent and streaming DMA)
184229b4e07SChangbin Du  - Allocate and initialize shared control data (pci_allocate_coherent())
185229b4e07SChangbin Du  - Access device configuration space (if needed)
186229b4e07SChangbin Du  - Register IRQ handler (request_irq())
187229b4e07SChangbin Du  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
188229b4e07SChangbin Du  - Enable DMA/processing engines.
189229b4e07SChangbin Du
190229b4e07SChangbin DuThe driver can access PCI config space registers at any time.
191229b4e07SChangbin Du(Well, almost. When running BIST, config space can go away...but
192229b4e07SChangbin Duthat will just result in a PCI Bus Master Abort and config reads
193229b4e07SChangbin Duwill return garbage).
194229b4e07SChangbin Du
195229b4e07SChangbin Du
196229b4e07SChangbin DuEnable the PCI device
197229b4e07SChangbin Du---------------------
198229b4e07SChangbin DuBefore touching any device registers, the driver needs to enable
199229b4e07SChangbin Duthe PCI device by calling pci_enable_device(). This will:
200229b4e07SChangbin Du
201229b4e07SChangbin Du  - wake up the device if it was in suspended state,
202229b4e07SChangbin Du  - allocate I/O and memory regions of the device (if BIOS did not),
203229b4e07SChangbin Du  - allocate an IRQ (if BIOS did not).
204229b4e07SChangbin Du
205229b4e07SChangbin Du.. note::
206229b4e07SChangbin Du   pci_enable_device() can fail! Check the return value.
207229b4e07SChangbin Du
208229b4e07SChangbin Du.. warning::
209229b4e07SChangbin Du   OS BUG: we don't check resource allocations before enabling those
210229b4e07SChangbin Du   resources. The sequence would make more sense if we called
211229b4e07SChangbin Du   pci_request_resources() before calling pci_enable_device().
212229b4e07SChangbin Du   Currently, the device drivers can't detect the bug when when two
213229b4e07SChangbin Du   devices have been allocated the same range. This is not a common
214229b4e07SChangbin Du   problem and unlikely to get fixed soon.
215229b4e07SChangbin Du
216229b4e07SChangbin Du   This has been discussed before but not changed as of 2.6.19:
217229b4e07SChangbin Du   http://lkml.org/lkml/2006/3/2/194
218229b4e07SChangbin Du
219229b4e07SChangbin Du
220229b4e07SChangbin Dupci_set_master() will enable DMA by setting the bus master bit
221229b4e07SChangbin Duin the PCI_COMMAND register. It also fixes the latency timer value if
222229b4e07SChangbin Duit's set to something bogus by the BIOS.  pci_clear_master() will
223229b4e07SChangbin Dudisable DMA by clearing the bus master bit.
224229b4e07SChangbin Du
225229b4e07SChangbin DuIf the PCI device can use the PCI Memory-Write-Invalidate transaction,
226229b4e07SChangbin Ducall pci_set_mwi().  This enables the PCI_COMMAND bit for Mem-Wr-Inval
227229b4e07SChangbin Duand also ensures that the cache line size register is set correctly.
228229b4e07SChangbin DuCheck the return value of pci_set_mwi() as not all architectures
229229b4e07SChangbin Duor chip-sets may support Memory-Write-Invalidate.  Alternatively,
230229b4e07SChangbin Duif Mem-Wr-Inval would be nice to have but is not required, call
231229b4e07SChangbin Dupci_try_set_mwi() to have the system do its best effort at enabling
232229b4e07SChangbin DuMem-Wr-Inval.
233229b4e07SChangbin Du
234229b4e07SChangbin Du
235229b4e07SChangbin DuRequest MMIO/IOP resources
236229b4e07SChangbin Du--------------------------
237229b4e07SChangbin DuMemory (MMIO), and I/O port addresses should NOT be read directly
238229b4e07SChangbin Dufrom the PCI device config space. Use the values in the pci_dev structure
239229b4e07SChangbin Duas the PCI "bus address" might have been remapped to a "host physical"
240229b4e07SChangbin Duaddress by the arch/chip-set specific kernel support.
241229b4e07SChangbin Du
2427d3d3254SMauro Carvalho ChehabSee Documentation/driver-api/io-mapping.rst for how to access device registers
243229b4e07SChangbin Duor device memory.
244229b4e07SChangbin Du
245229b4e07SChangbin DuThe device driver needs to call pci_request_region() to verify
246229b4e07SChangbin Duno other device is already using the same address resource.
247229b4e07SChangbin DuConversely, drivers should call pci_release_region() AFTER
248229b4e07SChangbin Ducalling pci_disable_device().
249229b4e07SChangbin DuThe idea is to prevent two devices colliding on the same address range.
250229b4e07SChangbin Du
251229b4e07SChangbin Du.. tip::
252229b4e07SChangbin Du   See OS BUG comment above. Currently (2.6.19), The driver can only
253229b4e07SChangbin Du   determine MMIO and IO Port resource availability _after_ calling
254229b4e07SChangbin Du   pci_enable_device().
255229b4e07SChangbin Du
256229b4e07SChangbin DuGeneric flavors of pci_request_region() are request_mem_region()
257229b4e07SChangbin Du(for MMIO ranges) and request_region() (for IO Port ranges).
258229b4e07SChangbin DuUse these for address resources that are not described by "normal" PCI
259229b4e07SChangbin DuBARs.
260229b4e07SChangbin Du
261229b4e07SChangbin DuAlso see pci_request_selected_regions() below.
262229b4e07SChangbin Du
263229b4e07SChangbin Du
264229b4e07SChangbin DuSet the DMA mask size
265229b4e07SChangbin Du---------------------
266229b4e07SChangbin Du.. note::
267229b4e07SChangbin Du   If anything below doesn't make sense, please refer to
268229b4e07SChangbin Du   Documentation/DMA-API.txt. This section is just a reminder that
269229b4e07SChangbin Du   drivers need to indicate DMA capabilities of the device and is not
270229b4e07SChangbin Du   an authoritative source for DMA interfaces.
271229b4e07SChangbin Du
272229b4e07SChangbin DuWhile all drivers should explicitly indicate the DMA capability
273229b4e07SChangbin Du(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
274229b4e07SChangbin Du32-bit bus master capability for streaming data need the driver
275229b4e07SChangbin Duto "register" this capability by calling pci_set_dma_mask() with
276229b4e07SChangbin Duappropriate parameters.  In general this allows more efficient DMA
277229b4e07SChangbin Duon systems where System RAM exists above 4G _physical_ address.
278229b4e07SChangbin Du
279229b4e07SChangbin DuDrivers for all PCI-X and PCIe compliant devices must call
280229b4e07SChangbin Dupci_set_dma_mask() as they are 64-bit DMA devices.
281229b4e07SChangbin Du
282229b4e07SChangbin DuSimilarly, drivers must also "register" this capability if the device
283229b4e07SChangbin Ducan directly address "consistent memory" in System RAM above 4G physical
284229b4e07SChangbin Duaddress by calling pci_set_consistent_dma_mask().
285229b4e07SChangbin DuAgain, this includes drivers for all PCI-X and PCIe compliant devices.
286229b4e07SChangbin DuMany 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
287229b4e07SChangbin Du64-bit DMA capable for payload ("streaming") data but not control
288229b4e07SChangbin Du("consistent") data.
289229b4e07SChangbin Du
290229b4e07SChangbin Du
291229b4e07SChangbin DuSetup shared control data
292229b4e07SChangbin Du-------------------------
293229b4e07SChangbin DuOnce the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
294229b4e07SChangbin Dumemory.  See Documentation/DMA-API.txt for a full description of
295229b4e07SChangbin Duthe DMA APIs. This section is just a reminder that it needs to be done
296229b4e07SChangbin Dubefore enabling DMA on the device.
297229b4e07SChangbin Du
298229b4e07SChangbin Du
299229b4e07SChangbin DuInitialize device registers
300229b4e07SChangbin Du---------------------------
301229b4e07SChangbin DuSome drivers will need specific "capability" fields programmed
302229b4e07SChangbin Duor other "vendor specific" register initialized or reset.
303229b4e07SChangbin DuE.g. clearing pending interrupts.
304229b4e07SChangbin Du
305229b4e07SChangbin Du
306229b4e07SChangbin DuRegister IRQ handler
307229b4e07SChangbin Du--------------------
308229b4e07SChangbin DuWhile calling request_irq() is the last step described here,
309229b4e07SChangbin Duthis is often just another intermediate step to initialize a device.
310229b4e07SChangbin DuThis step can often be deferred until the device is opened for use.
311229b4e07SChangbin Du
312229b4e07SChangbin DuAll interrupt handlers for IRQ lines should be registered with IRQF_SHARED
313229b4e07SChangbin Duand use the devid to map IRQs to devices (remember that all PCI IRQ lines
314229b4e07SChangbin Ducan be shared).
315229b4e07SChangbin Du
316229b4e07SChangbin Durequest_irq() will associate an interrupt handler and device handle
317229b4e07SChangbin Duwith an interrupt number. Historically interrupt numbers represent
318229b4e07SChangbin DuIRQ lines which run from the PCI device to the Interrupt controller.
319229b4e07SChangbin DuWith MSI and MSI-X (more below) the interrupt number is a CPU "vector".
320229b4e07SChangbin Du
321229b4e07SChangbin Durequest_irq() also enables the interrupt. Make sure the device is
322229b4e07SChangbin Duquiesced and does not have any interrupts pending before registering
323229b4e07SChangbin Duthe interrupt handler.
324229b4e07SChangbin Du
325229b4e07SChangbin DuMSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts"
326229b4e07SChangbin Duwhich deliver interrupts to the CPU via a DMA write to a Local APIC.
327229b4e07SChangbin DuThe fundamental difference between MSI and MSI-X is how multiple
328229b4e07SChangbin Du"vectors" get allocated. MSI requires contiguous blocks of vectors
329229b4e07SChangbin Duwhile MSI-X can allocate several individual ones.
330229b4e07SChangbin Du
331229b4e07SChangbin DuMSI capability can be enabled by calling pci_alloc_irq_vectors() with the
332229b4e07SChangbin DuPCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This
333229b4e07SChangbin Ducauses the PCI support to program CPU vector data into the PCI device
334229b4e07SChangbin Ducapability registers. Many architectures, chip-sets, or BIOSes do NOT
335229b4e07SChangbin Dusupport MSI or MSI-X and a call to pci_alloc_irq_vectors with just
336229b4e07SChangbin Duthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always
337229b4e07SChangbin Duspecify PCI_IRQ_LEGACY as well.
338229b4e07SChangbin Du
339229b4e07SChangbin DuDrivers that have different interrupt handlers for MSI/MSI-X and
340229b4e07SChangbin Dulegacy INTx should chose the right one based on the msi_enabled
341229b4e07SChangbin Duand msix_enabled flags in the pci_dev structure after calling
342229b4e07SChangbin Dupci_alloc_irq_vectors.
343229b4e07SChangbin Du
344229b4e07SChangbin DuThere are (at least) two really good reasons for using MSI:
345229b4e07SChangbin Du
346229b4e07SChangbin Du1) MSI is an exclusive interrupt vector by definition.
347229b4e07SChangbin Du   This means the interrupt handler doesn't have to verify
348229b4e07SChangbin Du   its device caused the interrupt.
349229b4e07SChangbin Du
350229b4e07SChangbin Du2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed
351229b4e07SChangbin Du   to be visible to the host CPU(s) when the MSI is delivered. This
352229b4e07SChangbin Du   is important for both data coherency and avoiding stale control data.
353229b4e07SChangbin Du   This guarantee allows the driver to omit MMIO reads to flush
354229b4e07SChangbin Du   the DMA stream.
355229b4e07SChangbin Du
356229b4e07SChangbin DuSee drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
357229b4e07SChangbin Duof MSI/MSI-X usage.
358229b4e07SChangbin Du
359229b4e07SChangbin Du
360229b4e07SChangbin DuPCI device shutdown
361229b4e07SChangbin Du===================
362229b4e07SChangbin Du
363229b4e07SChangbin DuWhen a PCI device driver is being unloaded, most of the following
364229b4e07SChangbin Dusteps need to be performed:
365229b4e07SChangbin Du
366229b4e07SChangbin Du  - Disable the device from generating IRQs
367229b4e07SChangbin Du  - Release the IRQ (free_irq())
368229b4e07SChangbin Du  - Stop all DMA activity
369229b4e07SChangbin Du  - Release DMA buffers (both streaming and consistent)
370229b4e07SChangbin Du  - Unregister from other subsystems (e.g. scsi or netdev)
371229b4e07SChangbin Du  - Disable device from responding to MMIO/IO Port addresses
372229b4e07SChangbin Du  - Release MMIO/IO Port resource(s)
373229b4e07SChangbin Du
374229b4e07SChangbin Du
375229b4e07SChangbin DuStop IRQs on the device
376229b4e07SChangbin Du-----------------------
377229b4e07SChangbin DuHow to do this is chip/device specific. If it's not done, it opens
378229b4e07SChangbin Duthe possibility of a "screaming interrupt" if (and only if)
379229b4e07SChangbin Duthe IRQ is shared with another device.
380229b4e07SChangbin Du
381229b4e07SChangbin DuWhen the shared IRQ handler is "unhooked", the remaining devices
382229b4e07SChangbin Duusing the same IRQ line will still need the IRQ enabled. Thus if the
383229b4e07SChangbin Du"unhooked" device asserts IRQ line, the system will respond assuming
384229b4e07SChangbin Duit was one of the remaining devices asserted the IRQ line. Since none
385229b4e07SChangbin Duof the other devices will handle the IRQ, the system will "hang" until
386229b4e07SChangbin Duit decides the IRQ isn't going to get handled and masks the IRQ (100,000
387229b4e07SChangbin Duiterations later). Once the shared IRQ is masked, the remaining devices
388229b4e07SChangbin Duwill stop functioning properly. Not a nice situation.
389229b4e07SChangbin Du
390229b4e07SChangbin DuThis is another reason to use MSI or MSI-X if it's available.
391229b4e07SChangbin DuMSI and MSI-X are defined to be exclusive interrupts and thus
392229b4e07SChangbin Duare not susceptible to the "screaming interrupt" problem.
393229b4e07SChangbin Du
394229b4e07SChangbin Du
395229b4e07SChangbin DuRelease the IRQ
396229b4e07SChangbin Du---------------
397229b4e07SChangbin DuOnce the device is quiesced (no more IRQs), one can call free_irq().
398229b4e07SChangbin DuThis function will return control once any pending IRQs are handled,
399229b4e07SChangbin Du"unhook" the drivers IRQ handler from that IRQ, and finally release
400229b4e07SChangbin Duthe IRQ if no one else is using it.
401229b4e07SChangbin Du
402229b4e07SChangbin Du
403229b4e07SChangbin DuStop all DMA activity
404229b4e07SChangbin Du---------------------
405229b4e07SChangbin DuIt's extremely important to stop all DMA operations BEFORE attempting
406229b4e07SChangbin Duto deallocate DMA control data. Failure to do so can result in memory
407229b4e07SChangbin Ducorruption, hangs, and on some chip-sets a hard crash.
408229b4e07SChangbin Du
409229b4e07SChangbin DuStopping DMA after stopping the IRQs can avoid races where the
410229b4e07SChangbin DuIRQ handler might restart DMA engines.
411229b4e07SChangbin Du
412229b4e07SChangbin DuWhile this step sounds obvious and trivial, several "mature" drivers
413229b4e07SChangbin Dudidn't get this step right in the past.
414229b4e07SChangbin Du
415229b4e07SChangbin Du
416229b4e07SChangbin DuRelease DMA buffers
417229b4e07SChangbin Du-------------------
418229b4e07SChangbin DuOnce DMA is stopped, clean up streaming DMA first.
419229b4e07SChangbin DuI.e. unmap data buffers and return buffers to "upstream"
420229b4e07SChangbin Duowners if there is one.
421229b4e07SChangbin Du
422229b4e07SChangbin DuThen clean up "consistent" buffers which contain the control data.
423229b4e07SChangbin Du
424229b4e07SChangbin DuSee Documentation/DMA-API.txt for details on unmapping interfaces.
425229b4e07SChangbin Du
426229b4e07SChangbin Du
427229b4e07SChangbin DuUnregister from other subsystems
428229b4e07SChangbin Du--------------------------------
429229b4e07SChangbin DuMost low level PCI device drivers support some other subsystem
430229b4e07SChangbin Dulike USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
431229b4e07SChangbin Dudriver isn't losing resources from that other subsystem.
432229b4e07SChangbin DuIf this happens, typically the symptom is an Oops (panic) when
433229b4e07SChangbin Duthe subsystem attempts to call into a driver that has been unloaded.
434229b4e07SChangbin Du
435229b4e07SChangbin Du
436229b4e07SChangbin DuDisable Device from responding to MMIO/IO Port addresses
437229b4e07SChangbin Du--------------------------------------------------------
438229b4e07SChangbin Duio_unmap() MMIO or IO Port resources and then call pci_disable_device().
439229b4e07SChangbin DuThis is the symmetric opposite of pci_enable_device().
440229b4e07SChangbin DuDo not access device registers after calling pci_disable_device().
441229b4e07SChangbin Du
442229b4e07SChangbin Du
443229b4e07SChangbin DuRelease MMIO/IO Port Resource(s)
444229b4e07SChangbin Du--------------------------------
445229b4e07SChangbin DuCall pci_release_region() to mark the MMIO or IO Port range as available.
446229b4e07SChangbin DuFailure to do so usually results in the inability to reload the driver.
447229b4e07SChangbin Du
448229b4e07SChangbin Du
449229b4e07SChangbin DuHow to access PCI config space
450229b4e07SChangbin Du==============================
451229b4e07SChangbin Du
452229b4e07SChangbin DuYou can use `pci_(read|write)_config_(byte|word|dword)` to access the config
453229b4e07SChangbin Duspace of a device represented by `struct pci_dev *`. All these functions return
454229b4e07SChangbin Du0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
455229b4e07SChangbin Dutext string by pcibios_strerror. Most drivers expect that accesses to valid PCI
456229b4e07SChangbin Dudevices don't fail.
457229b4e07SChangbin Du
458229b4e07SChangbin DuIf you don't have a struct pci_dev available, you can call
459229b4e07SChangbin Du`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
460229b4e07SChangbin Duand function on that bus.
461229b4e07SChangbin Du
462229b4e07SChangbin DuIf you access fields in the standard portion of the config header, please
463229b4e07SChangbin Duuse symbolic names of locations and bits declared in <linux/pci.h>.
464229b4e07SChangbin Du
465229b4e07SChangbin DuIf you need to access Extended PCI Capability registers, just call
466229b4e07SChangbin Dupci_find_capability() for the particular capability and it will find the
467229b4e07SChangbin Ducorresponding register block for you.
468229b4e07SChangbin Du
469229b4e07SChangbin Du
470229b4e07SChangbin DuOther interesting functions
471229b4e07SChangbin Du===========================
472229b4e07SChangbin Du
473229b4e07SChangbin Du=============================	================================================
474229b4e07SChangbin Dupci_get_domain_bus_and_slot()	Find pci_dev corresponding to given domain,
475229b4e07SChangbin Du				bus and slot and number. If the device is
476229b4e07SChangbin Du				found, its reference count is increased.
477229b4e07SChangbin Dupci_set_power_state()		Set PCI Power Management state (0=D0 ... 3=D3)
478229b4e07SChangbin Dupci_find_capability()		Find specified capability in device's capability
479229b4e07SChangbin Du				list.
480229b4e07SChangbin Dupci_resource_start()		Returns bus start address for a given PCI region
481229b4e07SChangbin Dupci_resource_end()		Returns bus end address for a given PCI region
482229b4e07SChangbin Dupci_resource_len()		Returns the byte length of a PCI region
483229b4e07SChangbin Dupci_set_drvdata()		Set private driver data pointer for a pci_dev
484229b4e07SChangbin Dupci_get_drvdata()		Return private driver data pointer for a pci_dev
485229b4e07SChangbin Dupci_set_mwi()			Enable Memory-Write-Invalidate transactions.
486229b4e07SChangbin Dupci_clear_mwi()			Disable Memory-Write-Invalidate transactions.
487229b4e07SChangbin Du=============================	================================================
488229b4e07SChangbin Du
489229b4e07SChangbin Du
490229b4e07SChangbin DuMiscellaneous hints
491229b4e07SChangbin Du===================
492229b4e07SChangbin Du
493229b4e07SChangbin DuWhen displaying PCI device names to the user (for example when a driver wants
494229b4e07SChangbin Duto tell the user what card has it found), please use pci_name(pci_dev).
495229b4e07SChangbin Du
496229b4e07SChangbin DuAlways refer to the PCI devices by a pointer to the pci_dev structure.
497229b4e07SChangbin DuAll PCI layer functions use this identification and it's the only
498229b4e07SChangbin Dureasonable one. Don't use bus/slot/function numbers except for very
499229b4e07SChangbin Duspecial purposes -- on systems with multiple primary buses their semantics
500229b4e07SChangbin Ducan be pretty complex.
501229b4e07SChangbin Du
502229b4e07SChangbin DuDon't try to turn on Fast Back to Back writes in your driver.  All devices
503229b4e07SChangbin Duon the bus need to be capable of doing it, so this is something which needs
504229b4e07SChangbin Duto be handled by platform and generic code, not individual drivers.
505229b4e07SChangbin Du
506229b4e07SChangbin Du
507229b4e07SChangbin DuVendor and device identifications
508229b4e07SChangbin Du=================================
509229b4e07SChangbin Du
510229b4e07SChangbin DuDo not add new device or vendor IDs to include/linux/pci_ids.h unless they
511229b4e07SChangbin Duare shared across multiple drivers.  You can add private definitions in
512229b4e07SChangbin Duyour driver if they're helpful, or just use plain hex constants.
513229b4e07SChangbin Du
514229b4e07SChangbin DuThe device IDs are arbitrary hex numbers (vendor controlled) and normally used
515229b4e07SChangbin Duonly in a single location, the pci_device_id table.
516229b4e07SChangbin Du
517*7ecd4a81SAlexander A. KlimovPlease DO submit new vendor/device IDs to https://pci-ids.ucw.cz/.
518*7ecd4a81SAlexander A. KlimovThere's a mirror of the pci.ids file at https://github.com/pciutils/pciids.
519229b4e07SChangbin Du
520229b4e07SChangbin Du
521229b4e07SChangbin DuObsolete functions
522229b4e07SChangbin Du==================
523229b4e07SChangbin Du
524229b4e07SChangbin DuThere are several functions which you might come across when trying to
525229b4e07SChangbin Duport an old driver to the new PCI interface.  They are no longer present
526229b4e07SChangbin Duin the kernel as they aren't compatible with hotplug or PCI domains or
527229b4e07SChangbin Duhaving sane locking.
528229b4e07SChangbin Du
529229b4e07SChangbin Du=================	===========================================
530229b4e07SChangbin Dupci_find_device()	Superseded by pci_get_device()
531229b4e07SChangbin Dupci_find_subsys()	Superseded by pci_get_subsys()
532229b4e07SChangbin Dupci_find_slot()		Superseded by pci_get_domain_bus_and_slot()
533229b4e07SChangbin Dupci_get_slot()		Superseded by pci_get_domain_bus_and_slot()
534229b4e07SChangbin Du=================	===========================================
535229b4e07SChangbin Du
536229b4e07SChangbin DuThe alternative is the traditional PCI device driver that walks PCI
537229b4e07SChangbin Dudevice lists. This is still possible but discouraged.
538229b4e07SChangbin Du
539229b4e07SChangbin Du
540229b4e07SChangbin DuMMIO Space and "Write Posting"
541229b4e07SChangbin Du==============================
542229b4e07SChangbin Du
543229b4e07SChangbin DuConverting a driver from using I/O Port space to using MMIO space
544229b4e07SChangbin Duoften requires some additional changes. Specifically, "write posting"
545229b4e07SChangbin Duneeds to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2)
546229b4e07SChangbin Dualready do this. I/O Port space guarantees write transactions reach the PCI
547229b4e07SChangbin Dudevice before the CPU can continue. Writes to MMIO space allow the CPU
548229b4e07SChangbin Duto continue before the transaction reaches the PCI device. HW weenies
549229b4e07SChangbin Ducall this "Write Posting" because the write completion is "posted" to
550229b4e07SChangbin Duthe CPU before the transaction has reached its destination.
551229b4e07SChangbin Du
552229b4e07SChangbin DuThus, timing sensitive code should add readl() where the CPU is
553229b4e07SChangbin Duexpected to wait before doing other work.  The classic "bit banging"
554229b4e07SChangbin Dusequence works fine for I/O Port space::
555229b4e07SChangbin Du
556229b4e07SChangbin Du       for (i = 8; --i; val >>= 1) {
557229b4e07SChangbin Du               outb(val & 1, ioport_reg);      /* write bit */
558229b4e07SChangbin Du               udelay(10);
559229b4e07SChangbin Du       }
560229b4e07SChangbin Du
561229b4e07SChangbin DuThe same sequence for MMIO space should be::
562229b4e07SChangbin Du
563229b4e07SChangbin Du       for (i = 8; --i; val >>= 1) {
564229b4e07SChangbin Du               writeb(val & 1, mmio_reg);      /* write bit */
565229b4e07SChangbin Du               readb(safe_mmio_reg);           /* flush posted write */
566229b4e07SChangbin Du               udelay(10);
567229b4e07SChangbin Du       }
568229b4e07SChangbin Du
569229b4e07SChangbin DuIt is important that "safe_mmio_reg" not have any side effects that
570229b4e07SChangbin Duinterferes with the correct operation of the device.
571229b4e07SChangbin Du
572229b4e07SChangbin DuAnother case to watch out for is when resetting a PCI device. Use PCI
573229b4e07SChangbin DuConfiguration space reads to flush the writel(). This will gracefully
574229b4e07SChangbin Duhandle the PCI master abort on all platforms if the PCI device is
575229b4e07SChangbin Duexpected to not respond to a readl().  Most x86 platforms will allow
576229b4e07SChangbin DuMMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
577229b4e07SChangbin Du(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
578