1229b4e07SChangbin Du.. SPDX-License-Identifier: GPL-2.0 2229b4e07SChangbin Du 3229b4e07SChangbin Du============================== 4229b4e07SChangbin DuHow To Write Linux PCI Drivers 5229b4e07SChangbin Du============================== 6229b4e07SChangbin Du 7229b4e07SChangbin Du:Authors: - Martin Mares <mj@ucw.cz> 8229b4e07SChangbin Du - Grant Grundler <grundler@parisc-linux.org> 9229b4e07SChangbin Du 10229b4e07SChangbin DuThe world of PCI is vast and full of (mostly unpleasant) surprises. 11229b4e07SChangbin DuSince each CPU architecture implements different chip-sets and PCI devices 12229b4e07SChangbin Duhave different requirements (erm, "features"), the result is the PCI support 13229b4e07SChangbin Duin the Linux kernel is not as trivial as one would wish. This short paper 14229b4e07SChangbin Dutries to introduce all potential driver authors to Linux APIs for 15229b4e07SChangbin DuPCI device drivers. 16229b4e07SChangbin Du 17229b4e07SChangbin DuA more complete resource is the third edition of "Linux Device Drivers" 18229b4e07SChangbin Duby Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. 19229b4e07SChangbin DuLDD3 is available for free (under Creative Commons License) from: 20*7ecd4a81SAlexander A. Klimovhttps://lwn.net/Kernel/LDD3/. 21229b4e07SChangbin Du 22229b4e07SChangbin DuHowever, keep in mind that all documents are subject to "bit rot". 23229b4e07SChangbin DuRefer to the source code if things are not working as described here. 24229b4e07SChangbin Du 25229b4e07SChangbin DuPlease send questions/comments/patches about Linux PCI API to the 26229b4e07SChangbin Du"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list. 27229b4e07SChangbin Du 28229b4e07SChangbin Du 29229b4e07SChangbin DuStructure of PCI drivers 30229b4e07SChangbin Du======================== 31229b4e07SChangbin DuPCI drivers "discover" PCI devices in a system via pci_register_driver(). 32229b4e07SChangbin DuActually, it's the other way around. When the PCI generic code discovers 33229b4e07SChangbin Dua new device, the driver with a matching "description" will be notified. 34229b4e07SChangbin DuDetails on this below. 35229b4e07SChangbin Du 36229b4e07SChangbin Dupci_register_driver() leaves most of the probing for devices to 37229b4e07SChangbin Duthe PCI layer and supports online insertion/removal of devices [thus 38229b4e07SChangbin Dusupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver]. 39229b4e07SChangbin Dupci_register_driver() call requires passing in a table of function 40229b4e07SChangbin Dupointers and thus dictates the high level structure of a driver. 41229b4e07SChangbin Du 42229b4e07SChangbin DuOnce the driver knows about a PCI device and takes ownership, the 43229b4e07SChangbin Dudriver generally needs to perform the following initialization: 44229b4e07SChangbin Du 45229b4e07SChangbin Du - Enable the device 46229b4e07SChangbin Du - Request MMIO/IOP resources 47229b4e07SChangbin Du - Set the DMA mask size (for both coherent and streaming DMA) 48229b4e07SChangbin Du - Allocate and initialize shared control data (pci_allocate_coherent()) 49229b4e07SChangbin Du - Access device configuration space (if needed) 50229b4e07SChangbin Du - Register IRQ handler (request_irq()) 51229b4e07SChangbin Du - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) 52229b4e07SChangbin Du - Enable DMA/processing engines 53229b4e07SChangbin Du 54229b4e07SChangbin DuWhen done using the device, and perhaps the module needs to be unloaded, 55229b4e07SChangbin Duthe driver needs to take the follow steps: 56229b4e07SChangbin Du 57229b4e07SChangbin Du - Disable the device from generating IRQs 58229b4e07SChangbin Du - Release the IRQ (free_irq()) 59229b4e07SChangbin Du - Stop all DMA activity 60229b4e07SChangbin Du - Release DMA buffers (both streaming and coherent) 61229b4e07SChangbin Du - Unregister from other subsystems (e.g. scsi or netdev) 62229b4e07SChangbin Du - Release MMIO/IOP resources 63229b4e07SChangbin Du - Disable the device 64229b4e07SChangbin Du 65229b4e07SChangbin DuMost of these topics are covered in the following sections. 66229b4e07SChangbin DuFor the rest look at LDD3 or <linux/pci.h> . 67229b4e07SChangbin Du 68229b4e07SChangbin DuIf the PCI subsystem is not configured (CONFIG_PCI is not set), most of 69229b4e07SChangbin Duthe PCI functions described below are defined as inline functions either 70229b4e07SChangbin Ducompletely empty or just returning an appropriate error codes to avoid 71229b4e07SChangbin Dulots of ifdefs in the drivers. 72229b4e07SChangbin Du 73229b4e07SChangbin Du 74229b4e07SChangbin Dupci_register_driver() call 75229b4e07SChangbin Du========================== 76229b4e07SChangbin Du 77229b4e07SChangbin DuPCI device drivers call ``pci_register_driver()`` during their 78229b4e07SChangbin Duinitialization with a pointer to a structure describing the driver 79229b4e07SChangbin Du(``struct pci_driver``): 80229b4e07SChangbin Du 81229b4e07SChangbin Du.. kernel-doc:: include/linux/pci.h 82229b4e07SChangbin Du :functions: pci_driver 83229b4e07SChangbin Du 84229b4e07SChangbin DuThe ID table is an array of ``struct pci_device_id`` entries ending with an 85229b4e07SChangbin Duall-zero entry. Definitions with static const are generally preferred. 86229b4e07SChangbin Du 87229b4e07SChangbin Du.. kernel-doc:: include/linux/mod_devicetable.h 88229b4e07SChangbin Du :functions: pci_device_id 89229b4e07SChangbin Du 90229b4e07SChangbin DuMost drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up 91229b4e07SChangbin Dua pci_device_id table. 92229b4e07SChangbin Du 93229b4e07SChangbin DuNew PCI IDs may be added to a device driver pci_ids table at runtime 94229b4e07SChangbin Duas shown below:: 95229b4e07SChangbin Du 96229b4e07SChangbin Du echo "vendor device subvendor subdevice class class_mask driver_data" > \ 97229b4e07SChangbin Du /sys/bus/pci/drivers/{driver}/new_id 98229b4e07SChangbin Du 99229b4e07SChangbin DuAll fields are passed in as hexadecimal values (no leading 0x). 100229b4e07SChangbin DuThe vendor and device fields are mandatory, the others are optional. Users 101229b4e07SChangbin Duneed pass only as many optional fields as necessary: 102229b4e07SChangbin Du 103229b4e07SChangbin Du - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) 104229b4e07SChangbin Du - class and classmask fields default to 0 105229b4e07SChangbin Du - driver_data defaults to 0UL. 106229b4e07SChangbin Du 107229b4e07SChangbin DuNote that driver_data must match the value used by any of the pci_device_id 108229b4e07SChangbin Duentries defined in the driver. This makes the driver_data field mandatory 109229b4e07SChangbin Duif all the pci_device_id entries have a non-zero driver_data value. 110229b4e07SChangbin Du 111229b4e07SChangbin DuOnce added, the driver probe routine will be invoked for any unclaimed 112229b4e07SChangbin DuPCI devices listed in its (newly updated) pci_ids list. 113229b4e07SChangbin Du 114229b4e07SChangbin DuWhen the driver exits, it just calls pci_unregister_driver() and the PCI layer 115229b4e07SChangbin Duautomatically calls the remove hook for all devices handled by the driver. 116229b4e07SChangbin Du 117229b4e07SChangbin Du 118229b4e07SChangbin Du"Attributes" for driver functions/data 119229b4e07SChangbin Du-------------------------------------- 120229b4e07SChangbin Du 121229b4e07SChangbin DuPlease mark the initialization and cleanup functions where appropriate 122229b4e07SChangbin Du(the corresponding macros are defined in <linux/init.h>): 123229b4e07SChangbin Du 124229b4e07SChangbin Du ====== ================================================= 125229b4e07SChangbin Du __init Initialization code. Thrown away after the driver 126229b4e07SChangbin Du initializes. 127229b4e07SChangbin Du __exit Exit code. Ignored for non-modular drivers. 128229b4e07SChangbin Du ====== ================================================= 129229b4e07SChangbin Du 130229b4e07SChangbin DuTips on when/where to use the above attributes: 131229b4e07SChangbin Du - The module_init()/module_exit() functions (and all 132229b4e07SChangbin Du initialization functions called _only_ from these) 133229b4e07SChangbin Du should be marked __init/__exit. 134229b4e07SChangbin Du 135229b4e07SChangbin Du - Do not mark the struct pci_driver. 136229b4e07SChangbin Du 137229b4e07SChangbin Du - Do NOT mark a function if you are not sure which mark to use. 138229b4e07SChangbin Du Better to not mark the function than mark the function wrong. 139229b4e07SChangbin Du 140229b4e07SChangbin Du 141229b4e07SChangbin DuHow to find PCI devices manually 142229b4e07SChangbin Du================================ 143229b4e07SChangbin Du 144229b4e07SChangbin DuPCI drivers should have a really good reason for not using the 145229b4e07SChangbin Dupci_register_driver() interface to search for PCI devices. 146229b4e07SChangbin DuThe main reason PCI devices are controlled by multiple drivers 147229b4e07SChangbin Duis because one PCI device implements several different HW services. 148229b4e07SChangbin DuE.g. combined serial/parallel port/floppy controller. 149229b4e07SChangbin Du 150229b4e07SChangbin DuA manual search may be performed using the following constructs: 151229b4e07SChangbin Du 152229b4e07SChangbin DuSearching by vendor and device ID:: 153229b4e07SChangbin Du 154229b4e07SChangbin Du struct pci_dev *dev = NULL; 155229b4e07SChangbin Du while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev)) 156229b4e07SChangbin Du configure_device(dev); 157229b4e07SChangbin Du 158229b4e07SChangbin DuSearching by class ID (iterate in a similar way):: 159229b4e07SChangbin Du 160229b4e07SChangbin Du pci_get_class(CLASS_ID, dev) 161229b4e07SChangbin Du 162229b4e07SChangbin DuSearching by both vendor/device and subsystem vendor/device ID:: 163229b4e07SChangbin Du 164229b4e07SChangbin Du pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev). 165229b4e07SChangbin Du 166229b4e07SChangbin DuYou can use the constant PCI_ANY_ID as a wildcard replacement for 167229b4e07SChangbin DuVENDOR_ID or DEVICE_ID. This allows searching for any device from a 168229b4e07SChangbin Duspecific vendor, for example. 169229b4e07SChangbin Du 170229b4e07SChangbin DuThese functions are hotplug-safe. They increment the reference count on 171229b4e07SChangbin Duthe pci_dev that they return. You must eventually (possibly at module unload) 172229b4e07SChangbin Dudecrement the reference count on these devices by calling pci_dev_put(). 173229b4e07SChangbin Du 174229b4e07SChangbin Du 175229b4e07SChangbin DuDevice Initialization Steps 176229b4e07SChangbin Du=========================== 177229b4e07SChangbin Du 178229b4e07SChangbin DuAs noted in the introduction, most PCI drivers need the following steps 179229b4e07SChangbin Dufor device initialization: 180229b4e07SChangbin Du 181229b4e07SChangbin Du - Enable the device 182229b4e07SChangbin Du - Request MMIO/IOP resources 183229b4e07SChangbin Du - Set the DMA mask size (for both coherent and streaming DMA) 184229b4e07SChangbin Du - Allocate and initialize shared control data (pci_allocate_coherent()) 185229b4e07SChangbin Du - Access device configuration space (if needed) 186229b4e07SChangbin Du - Register IRQ handler (request_irq()) 187229b4e07SChangbin Du - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) 188229b4e07SChangbin Du - Enable DMA/processing engines. 189229b4e07SChangbin Du 190229b4e07SChangbin DuThe driver can access PCI config space registers at any time. 191229b4e07SChangbin Du(Well, almost. When running BIST, config space can go away...but 192229b4e07SChangbin Duthat will just result in a PCI Bus Master Abort and config reads 193229b4e07SChangbin Duwill return garbage). 194229b4e07SChangbin Du 195229b4e07SChangbin Du 196229b4e07SChangbin DuEnable the PCI device 197229b4e07SChangbin Du--------------------- 198229b4e07SChangbin DuBefore touching any device registers, the driver needs to enable 199229b4e07SChangbin Duthe PCI device by calling pci_enable_device(). This will: 200229b4e07SChangbin Du 201229b4e07SChangbin Du - wake up the device if it was in suspended state, 202229b4e07SChangbin Du - allocate I/O and memory regions of the device (if BIOS did not), 203229b4e07SChangbin Du - allocate an IRQ (if BIOS did not). 204229b4e07SChangbin Du 205229b4e07SChangbin Du.. note:: 206229b4e07SChangbin Du pci_enable_device() can fail! Check the return value. 207229b4e07SChangbin Du 208229b4e07SChangbin Du.. warning:: 209229b4e07SChangbin Du OS BUG: we don't check resource allocations before enabling those 210229b4e07SChangbin Du resources. The sequence would make more sense if we called 211229b4e07SChangbin Du pci_request_resources() before calling pci_enable_device(). 212229b4e07SChangbin Du Currently, the device drivers can't detect the bug when when two 213229b4e07SChangbin Du devices have been allocated the same range. This is not a common 214229b4e07SChangbin Du problem and unlikely to get fixed soon. 215229b4e07SChangbin Du 216229b4e07SChangbin Du This has been discussed before but not changed as of 2.6.19: 217229b4e07SChangbin Du http://lkml.org/lkml/2006/3/2/194 218229b4e07SChangbin Du 219229b4e07SChangbin Du 220229b4e07SChangbin Dupci_set_master() will enable DMA by setting the bus master bit 221229b4e07SChangbin Duin the PCI_COMMAND register. It also fixes the latency timer value if 222229b4e07SChangbin Duit's set to something bogus by the BIOS. pci_clear_master() will 223229b4e07SChangbin Dudisable DMA by clearing the bus master bit. 224229b4e07SChangbin Du 225229b4e07SChangbin DuIf the PCI device can use the PCI Memory-Write-Invalidate transaction, 226229b4e07SChangbin Ducall pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval 227229b4e07SChangbin Duand also ensures that the cache line size register is set correctly. 228229b4e07SChangbin DuCheck the return value of pci_set_mwi() as not all architectures 229229b4e07SChangbin Duor chip-sets may support Memory-Write-Invalidate. Alternatively, 230229b4e07SChangbin Duif Mem-Wr-Inval would be nice to have but is not required, call 231229b4e07SChangbin Dupci_try_set_mwi() to have the system do its best effort at enabling 232229b4e07SChangbin DuMem-Wr-Inval. 233229b4e07SChangbin Du 234229b4e07SChangbin Du 235229b4e07SChangbin DuRequest MMIO/IOP resources 236229b4e07SChangbin Du-------------------------- 237229b4e07SChangbin DuMemory (MMIO), and I/O port addresses should NOT be read directly 238229b4e07SChangbin Dufrom the PCI device config space. Use the values in the pci_dev structure 239229b4e07SChangbin Duas the PCI "bus address" might have been remapped to a "host physical" 240229b4e07SChangbin Duaddress by the arch/chip-set specific kernel support. 241229b4e07SChangbin Du 2427d3d3254SMauro Carvalho ChehabSee Documentation/driver-api/io-mapping.rst for how to access device registers 243229b4e07SChangbin Duor device memory. 244229b4e07SChangbin Du 245229b4e07SChangbin DuThe device driver needs to call pci_request_region() to verify 246229b4e07SChangbin Duno other device is already using the same address resource. 247229b4e07SChangbin DuConversely, drivers should call pci_release_region() AFTER 248229b4e07SChangbin Ducalling pci_disable_device(). 249229b4e07SChangbin DuThe idea is to prevent two devices colliding on the same address range. 250229b4e07SChangbin Du 251229b4e07SChangbin Du.. tip:: 252229b4e07SChangbin Du See OS BUG comment above. Currently (2.6.19), The driver can only 253229b4e07SChangbin Du determine MMIO and IO Port resource availability _after_ calling 254229b4e07SChangbin Du pci_enable_device(). 255229b4e07SChangbin Du 256229b4e07SChangbin DuGeneric flavors of pci_request_region() are request_mem_region() 257229b4e07SChangbin Du(for MMIO ranges) and request_region() (for IO Port ranges). 258229b4e07SChangbin DuUse these for address resources that are not described by "normal" PCI 259229b4e07SChangbin DuBARs. 260229b4e07SChangbin Du 261229b4e07SChangbin DuAlso see pci_request_selected_regions() below. 262229b4e07SChangbin Du 263229b4e07SChangbin Du 264229b4e07SChangbin DuSet the DMA mask size 265229b4e07SChangbin Du--------------------- 266229b4e07SChangbin Du.. note:: 267229b4e07SChangbin Du If anything below doesn't make sense, please refer to 268229b4e07SChangbin Du Documentation/DMA-API.txt. This section is just a reminder that 269229b4e07SChangbin Du drivers need to indicate DMA capabilities of the device and is not 270229b4e07SChangbin Du an authoritative source for DMA interfaces. 271229b4e07SChangbin Du 272229b4e07SChangbin DuWhile all drivers should explicitly indicate the DMA capability 273229b4e07SChangbin Du(e.g. 32 or 64 bit) of the PCI bus master, devices with more than 274229b4e07SChangbin Du32-bit bus master capability for streaming data need the driver 275229b4e07SChangbin Duto "register" this capability by calling pci_set_dma_mask() with 276229b4e07SChangbin Duappropriate parameters. In general this allows more efficient DMA 277229b4e07SChangbin Duon systems where System RAM exists above 4G _physical_ address. 278229b4e07SChangbin Du 279229b4e07SChangbin DuDrivers for all PCI-X and PCIe compliant devices must call 280229b4e07SChangbin Dupci_set_dma_mask() as they are 64-bit DMA devices. 281229b4e07SChangbin Du 282229b4e07SChangbin DuSimilarly, drivers must also "register" this capability if the device 283229b4e07SChangbin Ducan directly address "consistent memory" in System RAM above 4G physical 284229b4e07SChangbin Duaddress by calling pci_set_consistent_dma_mask(). 285229b4e07SChangbin DuAgain, this includes drivers for all PCI-X and PCIe compliant devices. 286229b4e07SChangbin DuMany 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are 287229b4e07SChangbin Du64-bit DMA capable for payload ("streaming") data but not control 288229b4e07SChangbin Du("consistent") data. 289229b4e07SChangbin Du 290229b4e07SChangbin Du 291229b4e07SChangbin DuSetup shared control data 292229b4e07SChangbin Du------------------------- 293229b4e07SChangbin DuOnce the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) 294229b4e07SChangbin Dumemory. See Documentation/DMA-API.txt for a full description of 295229b4e07SChangbin Duthe DMA APIs. This section is just a reminder that it needs to be done 296229b4e07SChangbin Dubefore enabling DMA on the device. 297229b4e07SChangbin Du 298229b4e07SChangbin Du 299229b4e07SChangbin DuInitialize device registers 300229b4e07SChangbin Du--------------------------- 301229b4e07SChangbin DuSome drivers will need specific "capability" fields programmed 302229b4e07SChangbin Duor other "vendor specific" register initialized or reset. 303229b4e07SChangbin DuE.g. clearing pending interrupts. 304229b4e07SChangbin Du 305229b4e07SChangbin Du 306229b4e07SChangbin DuRegister IRQ handler 307229b4e07SChangbin Du-------------------- 308229b4e07SChangbin DuWhile calling request_irq() is the last step described here, 309229b4e07SChangbin Duthis is often just another intermediate step to initialize a device. 310229b4e07SChangbin DuThis step can often be deferred until the device is opened for use. 311229b4e07SChangbin Du 312229b4e07SChangbin DuAll interrupt handlers for IRQ lines should be registered with IRQF_SHARED 313229b4e07SChangbin Duand use the devid to map IRQs to devices (remember that all PCI IRQ lines 314229b4e07SChangbin Ducan be shared). 315229b4e07SChangbin Du 316229b4e07SChangbin Durequest_irq() will associate an interrupt handler and device handle 317229b4e07SChangbin Duwith an interrupt number. Historically interrupt numbers represent 318229b4e07SChangbin DuIRQ lines which run from the PCI device to the Interrupt controller. 319229b4e07SChangbin DuWith MSI and MSI-X (more below) the interrupt number is a CPU "vector". 320229b4e07SChangbin Du 321229b4e07SChangbin Durequest_irq() also enables the interrupt. Make sure the device is 322229b4e07SChangbin Duquiesced and does not have any interrupts pending before registering 323229b4e07SChangbin Duthe interrupt handler. 324229b4e07SChangbin Du 325229b4e07SChangbin DuMSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts" 326229b4e07SChangbin Duwhich deliver interrupts to the CPU via a DMA write to a Local APIC. 327229b4e07SChangbin DuThe fundamental difference between MSI and MSI-X is how multiple 328229b4e07SChangbin Du"vectors" get allocated. MSI requires contiguous blocks of vectors 329229b4e07SChangbin Duwhile MSI-X can allocate several individual ones. 330229b4e07SChangbin Du 331229b4e07SChangbin DuMSI capability can be enabled by calling pci_alloc_irq_vectors() with the 332229b4e07SChangbin DuPCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This 333229b4e07SChangbin Ducauses the PCI support to program CPU vector data into the PCI device 334229b4e07SChangbin Ducapability registers. Many architectures, chip-sets, or BIOSes do NOT 335229b4e07SChangbin Dusupport MSI or MSI-X and a call to pci_alloc_irq_vectors with just 336229b4e07SChangbin Duthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always 337229b4e07SChangbin Duspecify PCI_IRQ_LEGACY as well. 338229b4e07SChangbin Du 339229b4e07SChangbin DuDrivers that have different interrupt handlers for MSI/MSI-X and 340229b4e07SChangbin Dulegacy INTx should chose the right one based on the msi_enabled 341229b4e07SChangbin Duand msix_enabled flags in the pci_dev structure after calling 342229b4e07SChangbin Dupci_alloc_irq_vectors. 343229b4e07SChangbin Du 344229b4e07SChangbin DuThere are (at least) two really good reasons for using MSI: 345229b4e07SChangbin Du 346229b4e07SChangbin Du1) MSI is an exclusive interrupt vector by definition. 347229b4e07SChangbin Du This means the interrupt handler doesn't have to verify 348229b4e07SChangbin Du its device caused the interrupt. 349229b4e07SChangbin Du 350229b4e07SChangbin Du2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed 351229b4e07SChangbin Du to be visible to the host CPU(s) when the MSI is delivered. This 352229b4e07SChangbin Du is important for both data coherency and avoiding stale control data. 353229b4e07SChangbin Du This guarantee allows the driver to omit MMIO reads to flush 354229b4e07SChangbin Du the DMA stream. 355229b4e07SChangbin Du 356229b4e07SChangbin DuSee drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples 357229b4e07SChangbin Duof MSI/MSI-X usage. 358229b4e07SChangbin Du 359229b4e07SChangbin Du 360229b4e07SChangbin DuPCI device shutdown 361229b4e07SChangbin Du=================== 362229b4e07SChangbin Du 363229b4e07SChangbin DuWhen a PCI device driver is being unloaded, most of the following 364229b4e07SChangbin Dusteps need to be performed: 365229b4e07SChangbin Du 366229b4e07SChangbin Du - Disable the device from generating IRQs 367229b4e07SChangbin Du - Release the IRQ (free_irq()) 368229b4e07SChangbin Du - Stop all DMA activity 369229b4e07SChangbin Du - Release DMA buffers (both streaming and consistent) 370229b4e07SChangbin Du - Unregister from other subsystems (e.g. scsi or netdev) 371229b4e07SChangbin Du - Disable device from responding to MMIO/IO Port addresses 372229b4e07SChangbin Du - Release MMIO/IO Port resource(s) 373229b4e07SChangbin Du 374229b4e07SChangbin Du 375229b4e07SChangbin DuStop IRQs on the device 376229b4e07SChangbin Du----------------------- 377229b4e07SChangbin DuHow to do this is chip/device specific. If it's not done, it opens 378229b4e07SChangbin Duthe possibility of a "screaming interrupt" if (and only if) 379229b4e07SChangbin Duthe IRQ is shared with another device. 380229b4e07SChangbin Du 381229b4e07SChangbin DuWhen the shared IRQ handler is "unhooked", the remaining devices 382229b4e07SChangbin Duusing the same IRQ line will still need the IRQ enabled. Thus if the 383229b4e07SChangbin Du"unhooked" device asserts IRQ line, the system will respond assuming 384229b4e07SChangbin Duit was one of the remaining devices asserted the IRQ line. Since none 385229b4e07SChangbin Duof the other devices will handle the IRQ, the system will "hang" until 386229b4e07SChangbin Duit decides the IRQ isn't going to get handled and masks the IRQ (100,000 387229b4e07SChangbin Duiterations later). Once the shared IRQ is masked, the remaining devices 388229b4e07SChangbin Duwill stop functioning properly. Not a nice situation. 389229b4e07SChangbin Du 390229b4e07SChangbin DuThis is another reason to use MSI or MSI-X if it's available. 391229b4e07SChangbin DuMSI and MSI-X are defined to be exclusive interrupts and thus 392229b4e07SChangbin Duare not susceptible to the "screaming interrupt" problem. 393229b4e07SChangbin Du 394229b4e07SChangbin Du 395229b4e07SChangbin DuRelease the IRQ 396229b4e07SChangbin Du--------------- 397229b4e07SChangbin DuOnce the device is quiesced (no more IRQs), one can call free_irq(). 398229b4e07SChangbin DuThis function will return control once any pending IRQs are handled, 399229b4e07SChangbin Du"unhook" the drivers IRQ handler from that IRQ, and finally release 400229b4e07SChangbin Duthe IRQ if no one else is using it. 401229b4e07SChangbin Du 402229b4e07SChangbin Du 403229b4e07SChangbin DuStop all DMA activity 404229b4e07SChangbin Du--------------------- 405229b4e07SChangbin DuIt's extremely important to stop all DMA operations BEFORE attempting 406229b4e07SChangbin Duto deallocate DMA control data. Failure to do so can result in memory 407229b4e07SChangbin Ducorruption, hangs, and on some chip-sets a hard crash. 408229b4e07SChangbin Du 409229b4e07SChangbin DuStopping DMA after stopping the IRQs can avoid races where the 410229b4e07SChangbin DuIRQ handler might restart DMA engines. 411229b4e07SChangbin Du 412229b4e07SChangbin DuWhile this step sounds obvious and trivial, several "mature" drivers 413229b4e07SChangbin Dudidn't get this step right in the past. 414229b4e07SChangbin Du 415229b4e07SChangbin Du 416229b4e07SChangbin DuRelease DMA buffers 417229b4e07SChangbin Du------------------- 418229b4e07SChangbin DuOnce DMA is stopped, clean up streaming DMA first. 419229b4e07SChangbin DuI.e. unmap data buffers and return buffers to "upstream" 420229b4e07SChangbin Duowners if there is one. 421229b4e07SChangbin Du 422229b4e07SChangbin DuThen clean up "consistent" buffers which contain the control data. 423229b4e07SChangbin Du 424229b4e07SChangbin DuSee Documentation/DMA-API.txt for details on unmapping interfaces. 425229b4e07SChangbin Du 426229b4e07SChangbin Du 427229b4e07SChangbin DuUnregister from other subsystems 428229b4e07SChangbin Du-------------------------------- 429229b4e07SChangbin DuMost low level PCI device drivers support some other subsystem 430229b4e07SChangbin Dulike USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your 431229b4e07SChangbin Dudriver isn't losing resources from that other subsystem. 432229b4e07SChangbin DuIf this happens, typically the symptom is an Oops (panic) when 433229b4e07SChangbin Duthe subsystem attempts to call into a driver that has been unloaded. 434229b4e07SChangbin Du 435229b4e07SChangbin Du 436229b4e07SChangbin DuDisable Device from responding to MMIO/IO Port addresses 437229b4e07SChangbin Du-------------------------------------------------------- 438229b4e07SChangbin Duio_unmap() MMIO or IO Port resources and then call pci_disable_device(). 439229b4e07SChangbin DuThis is the symmetric opposite of pci_enable_device(). 440229b4e07SChangbin DuDo not access device registers after calling pci_disable_device(). 441229b4e07SChangbin Du 442229b4e07SChangbin Du 443229b4e07SChangbin DuRelease MMIO/IO Port Resource(s) 444229b4e07SChangbin Du-------------------------------- 445229b4e07SChangbin DuCall pci_release_region() to mark the MMIO or IO Port range as available. 446229b4e07SChangbin DuFailure to do so usually results in the inability to reload the driver. 447229b4e07SChangbin Du 448229b4e07SChangbin Du 449229b4e07SChangbin DuHow to access PCI config space 450229b4e07SChangbin Du============================== 451229b4e07SChangbin Du 452229b4e07SChangbin DuYou can use `pci_(read|write)_config_(byte|word|dword)` to access the config 453229b4e07SChangbin Duspace of a device represented by `struct pci_dev *`. All these functions return 454229b4e07SChangbin Du0 when successful or an error code (`PCIBIOS_...`) which can be translated to a 455229b4e07SChangbin Dutext string by pcibios_strerror. Most drivers expect that accesses to valid PCI 456229b4e07SChangbin Dudevices don't fail. 457229b4e07SChangbin Du 458229b4e07SChangbin DuIf you don't have a struct pci_dev available, you can call 459229b4e07SChangbin Du`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device 460229b4e07SChangbin Duand function on that bus. 461229b4e07SChangbin Du 462229b4e07SChangbin DuIf you access fields in the standard portion of the config header, please 463229b4e07SChangbin Duuse symbolic names of locations and bits declared in <linux/pci.h>. 464229b4e07SChangbin Du 465229b4e07SChangbin DuIf you need to access Extended PCI Capability registers, just call 466229b4e07SChangbin Dupci_find_capability() for the particular capability and it will find the 467229b4e07SChangbin Ducorresponding register block for you. 468229b4e07SChangbin Du 469229b4e07SChangbin Du 470229b4e07SChangbin DuOther interesting functions 471229b4e07SChangbin Du=========================== 472229b4e07SChangbin Du 473229b4e07SChangbin Du============================= ================================================ 474229b4e07SChangbin Dupci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain, 475229b4e07SChangbin Du bus and slot and number. If the device is 476229b4e07SChangbin Du found, its reference count is increased. 477229b4e07SChangbin Dupci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3) 478229b4e07SChangbin Dupci_find_capability() Find specified capability in device's capability 479229b4e07SChangbin Du list. 480229b4e07SChangbin Dupci_resource_start() Returns bus start address for a given PCI region 481229b4e07SChangbin Dupci_resource_end() Returns bus end address for a given PCI region 482229b4e07SChangbin Dupci_resource_len() Returns the byte length of a PCI region 483229b4e07SChangbin Dupci_set_drvdata() Set private driver data pointer for a pci_dev 484229b4e07SChangbin Dupci_get_drvdata() Return private driver data pointer for a pci_dev 485229b4e07SChangbin Dupci_set_mwi() Enable Memory-Write-Invalidate transactions. 486229b4e07SChangbin Dupci_clear_mwi() Disable Memory-Write-Invalidate transactions. 487229b4e07SChangbin Du============================= ================================================ 488229b4e07SChangbin Du 489229b4e07SChangbin Du 490229b4e07SChangbin DuMiscellaneous hints 491229b4e07SChangbin Du=================== 492229b4e07SChangbin Du 493229b4e07SChangbin DuWhen displaying PCI device names to the user (for example when a driver wants 494229b4e07SChangbin Duto tell the user what card has it found), please use pci_name(pci_dev). 495229b4e07SChangbin Du 496229b4e07SChangbin DuAlways refer to the PCI devices by a pointer to the pci_dev structure. 497229b4e07SChangbin DuAll PCI layer functions use this identification and it's the only 498229b4e07SChangbin Dureasonable one. Don't use bus/slot/function numbers except for very 499229b4e07SChangbin Duspecial purposes -- on systems with multiple primary buses their semantics 500229b4e07SChangbin Ducan be pretty complex. 501229b4e07SChangbin Du 502229b4e07SChangbin DuDon't try to turn on Fast Back to Back writes in your driver. All devices 503229b4e07SChangbin Duon the bus need to be capable of doing it, so this is something which needs 504229b4e07SChangbin Duto be handled by platform and generic code, not individual drivers. 505229b4e07SChangbin Du 506229b4e07SChangbin Du 507229b4e07SChangbin DuVendor and device identifications 508229b4e07SChangbin Du================================= 509229b4e07SChangbin Du 510229b4e07SChangbin DuDo not add new device or vendor IDs to include/linux/pci_ids.h unless they 511229b4e07SChangbin Duare shared across multiple drivers. You can add private definitions in 512229b4e07SChangbin Duyour driver if they're helpful, or just use plain hex constants. 513229b4e07SChangbin Du 514229b4e07SChangbin DuThe device IDs are arbitrary hex numbers (vendor controlled) and normally used 515229b4e07SChangbin Duonly in a single location, the pci_device_id table. 516229b4e07SChangbin Du 517*7ecd4a81SAlexander A. KlimovPlease DO submit new vendor/device IDs to https://pci-ids.ucw.cz/. 518*7ecd4a81SAlexander A. KlimovThere's a mirror of the pci.ids file at https://github.com/pciutils/pciids. 519229b4e07SChangbin Du 520229b4e07SChangbin Du 521229b4e07SChangbin DuObsolete functions 522229b4e07SChangbin Du================== 523229b4e07SChangbin Du 524229b4e07SChangbin DuThere are several functions which you might come across when trying to 525229b4e07SChangbin Duport an old driver to the new PCI interface. They are no longer present 526229b4e07SChangbin Duin the kernel as they aren't compatible with hotplug or PCI domains or 527229b4e07SChangbin Duhaving sane locking. 528229b4e07SChangbin Du 529229b4e07SChangbin Du================= =========================================== 530229b4e07SChangbin Dupci_find_device() Superseded by pci_get_device() 531229b4e07SChangbin Dupci_find_subsys() Superseded by pci_get_subsys() 532229b4e07SChangbin Dupci_find_slot() Superseded by pci_get_domain_bus_and_slot() 533229b4e07SChangbin Dupci_get_slot() Superseded by pci_get_domain_bus_and_slot() 534229b4e07SChangbin Du================= =========================================== 535229b4e07SChangbin Du 536229b4e07SChangbin DuThe alternative is the traditional PCI device driver that walks PCI 537229b4e07SChangbin Dudevice lists. This is still possible but discouraged. 538229b4e07SChangbin Du 539229b4e07SChangbin Du 540229b4e07SChangbin DuMMIO Space and "Write Posting" 541229b4e07SChangbin Du============================== 542229b4e07SChangbin Du 543229b4e07SChangbin DuConverting a driver from using I/O Port space to using MMIO space 544229b4e07SChangbin Duoften requires some additional changes. Specifically, "write posting" 545229b4e07SChangbin Duneeds to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2) 546229b4e07SChangbin Dualready do this. I/O Port space guarantees write transactions reach the PCI 547229b4e07SChangbin Dudevice before the CPU can continue. Writes to MMIO space allow the CPU 548229b4e07SChangbin Duto continue before the transaction reaches the PCI device. HW weenies 549229b4e07SChangbin Ducall this "Write Posting" because the write completion is "posted" to 550229b4e07SChangbin Duthe CPU before the transaction has reached its destination. 551229b4e07SChangbin Du 552229b4e07SChangbin DuThus, timing sensitive code should add readl() where the CPU is 553229b4e07SChangbin Duexpected to wait before doing other work. The classic "bit banging" 554229b4e07SChangbin Dusequence works fine for I/O Port space:: 555229b4e07SChangbin Du 556229b4e07SChangbin Du for (i = 8; --i; val >>= 1) { 557229b4e07SChangbin Du outb(val & 1, ioport_reg); /* write bit */ 558229b4e07SChangbin Du udelay(10); 559229b4e07SChangbin Du } 560229b4e07SChangbin Du 561229b4e07SChangbin DuThe same sequence for MMIO space should be:: 562229b4e07SChangbin Du 563229b4e07SChangbin Du for (i = 8; --i; val >>= 1) { 564229b4e07SChangbin Du writeb(val & 1, mmio_reg); /* write bit */ 565229b4e07SChangbin Du readb(safe_mmio_reg); /* flush posted write */ 566229b4e07SChangbin Du udelay(10); 567229b4e07SChangbin Du } 568229b4e07SChangbin Du 569229b4e07SChangbin DuIt is important that "safe_mmio_reg" not have any side effects that 570229b4e07SChangbin Duinterferes with the correct operation of the device. 571229b4e07SChangbin Du 572229b4e07SChangbin DuAnother case to watch out for is when resetting a PCI device. Use PCI 573229b4e07SChangbin DuConfiguration space reads to flush the writel(). This will gracefully 574229b4e07SChangbin Duhandle the PCI master abort on all platforms if the PCI device is 575229b4e07SChangbin Duexpected to not respond to a readl(). Most x86 platforms will allow 576229b4e07SChangbin DuMMIO reads to master abort (a.k.a. "Soft Fail") and return garbage 577229b4e07SChangbin Du(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail"). 578