1229b4e07SChangbin Du.. SPDX-License-Identifier: GPL-2.0 2229b4e07SChangbin Du 3229b4e07SChangbin Du============================== 4229b4e07SChangbin DuHow To Write Linux PCI Drivers 5229b4e07SChangbin Du============================== 6229b4e07SChangbin Du 7229b4e07SChangbin Du:Authors: - Martin Mares <mj@ucw.cz> 8229b4e07SChangbin Du - Grant Grundler <grundler@parisc-linux.org> 9229b4e07SChangbin Du 10229b4e07SChangbin DuThe world of PCI is vast and full of (mostly unpleasant) surprises. 11229b4e07SChangbin DuSince each CPU architecture implements different chip-sets and PCI devices 12229b4e07SChangbin Duhave different requirements (erm, "features"), the result is the PCI support 13229b4e07SChangbin Duin the Linux kernel is not as trivial as one would wish. This short paper 14229b4e07SChangbin Dutries to introduce all potential driver authors to Linux APIs for 15229b4e07SChangbin DuPCI device drivers. 16229b4e07SChangbin Du 17229b4e07SChangbin DuA more complete resource is the third edition of "Linux Device Drivers" 18229b4e07SChangbin Duby Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. 19229b4e07SChangbin DuLDD3 is available for free (under Creative Commons License) from: 207ecd4a81SAlexander A. Klimovhttps://lwn.net/Kernel/LDD3/. 21229b4e07SChangbin Du 22229b4e07SChangbin DuHowever, keep in mind that all documents are subject to "bit rot". 23229b4e07SChangbin DuRefer to the source code if things are not working as described here. 24229b4e07SChangbin Du 25229b4e07SChangbin DuPlease send questions/comments/patches about Linux PCI API to the 26229b4e07SChangbin Du"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list. 27229b4e07SChangbin Du 28229b4e07SChangbin Du 29229b4e07SChangbin DuStructure of PCI drivers 30229b4e07SChangbin Du======================== 31229b4e07SChangbin DuPCI drivers "discover" PCI devices in a system via pci_register_driver(). 32229b4e07SChangbin DuActually, it's the other way around. When the PCI generic code discovers 33229b4e07SChangbin Dua new device, the driver with a matching "description" will be notified. 34229b4e07SChangbin DuDetails on this below. 35229b4e07SChangbin Du 36229b4e07SChangbin Dupci_register_driver() leaves most of the probing for devices to 37229b4e07SChangbin Duthe PCI layer and supports online insertion/removal of devices [thus 38229b4e07SChangbin Dusupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver]. 39229b4e07SChangbin Dupci_register_driver() call requires passing in a table of function 40229b4e07SChangbin Dupointers and thus dictates the high level structure of a driver. 41229b4e07SChangbin Du 42229b4e07SChangbin DuOnce the driver knows about a PCI device and takes ownership, the 43229b4e07SChangbin Dudriver generally needs to perform the following initialization: 44229b4e07SChangbin Du 45229b4e07SChangbin Du - Enable the device 46229b4e07SChangbin Du - Request MMIO/IOP resources 47229b4e07SChangbin Du - Set the DMA mask size (for both coherent and streaming DMA) 48229b4e07SChangbin Du - Allocate and initialize shared control data (pci_allocate_coherent()) 49229b4e07SChangbin Du - Access device configuration space (if needed) 50229b4e07SChangbin Du - Register IRQ handler (request_irq()) 51229b4e07SChangbin Du - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) 52229b4e07SChangbin Du - Enable DMA/processing engines 53229b4e07SChangbin Du 54229b4e07SChangbin DuWhen done using the device, and perhaps the module needs to be unloaded, 55229b4e07SChangbin Duthe driver needs to take the follow steps: 56229b4e07SChangbin Du 57229b4e07SChangbin Du - Disable the device from generating IRQs 58229b4e07SChangbin Du - Release the IRQ (free_irq()) 59229b4e07SChangbin Du - Stop all DMA activity 60229b4e07SChangbin Du - Release DMA buffers (both streaming and coherent) 61229b4e07SChangbin Du - Unregister from other subsystems (e.g. scsi or netdev) 62229b4e07SChangbin Du - Release MMIO/IOP resources 63229b4e07SChangbin Du - Disable the device 64229b4e07SChangbin Du 65229b4e07SChangbin DuMost of these topics are covered in the following sections. 66229b4e07SChangbin DuFor the rest look at LDD3 or <linux/pci.h> . 67229b4e07SChangbin Du 68229b4e07SChangbin DuIf the PCI subsystem is not configured (CONFIG_PCI is not set), most of 69229b4e07SChangbin Duthe PCI functions described below are defined as inline functions either 70229b4e07SChangbin Ducompletely empty or just returning an appropriate error codes to avoid 71229b4e07SChangbin Dulots of ifdefs in the drivers. 72229b4e07SChangbin Du 73229b4e07SChangbin Du 74229b4e07SChangbin Dupci_register_driver() call 75229b4e07SChangbin Du========================== 76229b4e07SChangbin Du 77229b4e07SChangbin DuPCI device drivers call ``pci_register_driver()`` during their 78229b4e07SChangbin Duinitialization with a pointer to a structure describing the driver 79229b4e07SChangbin Du(``struct pci_driver``): 80229b4e07SChangbin Du 81229b4e07SChangbin Du.. kernel-doc:: include/linux/pci.h 82229b4e07SChangbin Du :functions: pci_driver 83229b4e07SChangbin Du 84229b4e07SChangbin DuThe ID table is an array of ``struct pci_device_id`` entries ending with an 85229b4e07SChangbin Duall-zero entry. Definitions with static const are generally preferred. 86229b4e07SChangbin Du 87229b4e07SChangbin Du.. kernel-doc:: include/linux/mod_devicetable.h 88229b4e07SChangbin Du :functions: pci_device_id 89229b4e07SChangbin Du 90229b4e07SChangbin DuMost drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up 91229b4e07SChangbin Dua pci_device_id table. 92229b4e07SChangbin Du 93229b4e07SChangbin DuNew PCI IDs may be added to a device driver pci_ids table at runtime 94229b4e07SChangbin Duas shown below:: 95229b4e07SChangbin Du 96229b4e07SChangbin Du echo "vendor device subvendor subdevice class class_mask driver_data" > \ 97229b4e07SChangbin Du /sys/bus/pci/drivers/{driver}/new_id 98229b4e07SChangbin Du 99229b4e07SChangbin DuAll fields are passed in as hexadecimal values (no leading 0x). 100229b4e07SChangbin DuThe vendor and device fields are mandatory, the others are optional. Users 101229b4e07SChangbin Duneed pass only as many optional fields as necessary: 102229b4e07SChangbin Du 103229b4e07SChangbin Du - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) 104229b4e07SChangbin Du - class and classmask fields default to 0 105229b4e07SChangbin Du - driver_data defaults to 0UL. 106343b7258SMax Gurtovoy - override_only field defaults to 0. 107229b4e07SChangbin Du 108229b4e07SChangbin DuNote that driver_data must match the value used by any of the pci_device_id 109229b4e07SChangbin Duentries defined in the driver. This makes the driver_data field mandatory 110229b4e07SChangbin Duif all the pci_device_id entries have a non-zero driver_data value. 111229b4e07SChangbin Du 112229b4e07SChangbin DuOnce added, the driver probe routine will be invoked for any unclaimed 113229b4e07SChangbin DuPCI devices listed in its (newly updated) pci_ids list. 114229b4e07SChangbin Du 115229b4e07SChangbin DuWhen the driver exits, it just calls pci_unregister_driver() and the PCI layer 116229b4e07SChangbin Duautomatically calls the remove hook for all devices handled by the driver. 117229b4e07SChangbin Du 118229b4e07SChangbin Du 119229b4e07SChangbin Du"Attributes" for driver functions/data 120229b4e07SChangbin Du-------------------------------------- 121229b4e07SChangbin Du 122229b4e07SChangbin DuPlease mark the initialization and cleanup functions where appropriate 123229b4e07SChangbin Du(the corresponding macros are defined in <linux/init.h>): 124229b4e07SChangbin Du 125229b4e07SChangbin Du ====== ================================================= 126229b4e07SChangbin Du __init Initialization code. Thrown away after the driver 127229b4e07SChangbin Du initializes. 128229b4e07SChangbin Du __exit Exit code. Ignored for non-modular drivers. 129229b4e07SChangbin Du ====== ================================================= 130229b4e07SChangbin Du 131229b4e07SChangbin DuTips on when/where to use the above attributes: 132229b4e07SChangbin Du - The module_init()/module_exit() functions (and all 133229b4e07SChangbin Du initialization functions called _only_ from these) 134229b4e07SChangbin Du should be marked __init/__exit. 135229b4e07SChangbin Du 136229b4e07SChangbin Du - Do not mark the struct pci_driver. 137229b4e07SChangbin Du 138229b4e07SChangbin Du - Do NOT mark a function if you are not sure which mark to use. 139229b4e07SChangbin Du Better to not mark the function than mark the function wrong. 140229b4e07SChangbin Du 141229b4e07SChangbin Du 142229b4e07SChangbin DuHow to find PCI devices manually 143229b4e07SChangbin Du================================ 144229b4e07SChangbin Du 145229b4e07SChangbin DuPCI drivers should have a really good reason for not using the 146229b4e07SChangbin Dupci_register_driver() interface to search for PCI devices. 147229b4e07SChangbin DuThe main reason PCI devices are controlled by multiple drivers 148229b4e07SChangbin Duis because one PCI device implements several different HW services. 149229b4e07SChangbin DuE.g. combined serial/parallel port/floppy controller. 150229b4e07SChangbin Du 151229b4e07SChangbin DuA manual search may be performed using the following constructs: 152229b4e07SChangbin Du 153229b4e07SChangbin DuSearching by vendor and device ID:: 154229b4e07SChangbin Du 155229b4e07SChangbin Du struct pci_dev *dev = NULL; 156229b4e07SChangbin Du while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev)) 157229b4e07SChangbin Du configure_device(dev); 158229b4e07SChangbin Du 159229b4e07SChangbin DuSearching by class ID (iterate in a similar way):: 160229b4e07SChangbin Du 161229b4e07SChangbin Du pci_get_class(CLASS_ID, dev) 162229b4e07SChangbin Du 163229b4e07SChangbin DuSearching by both vendor/device and subsystem vendor/device ID:: 164229b4e07SChangbin Du 165229b4e07SChangbin Du pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev). 166229b4e07SChangbin Du 167229b4e07SChangbin DuYou can use the constant PCI_ANY_ID as a wildcard replacement for 168229b4e07SChangbin DuVENDOR_ID or DEVICE_ID. This allows searching for any device from a 169229b4e07SChangbin Duspecific vendor, for example. 170229b4e07SChangbin Du 171229b4e07SChangbin DuThese functions are hotplug-safe. They increment the reference count on 172229b4e07SChangbin Duthe pci_dev that they return. You must eventually (possibly at module unload) 173229b4e07SChangbin Dudecrement the reference count on these devices by calling pci_dev_put(). 174229b4e07SChangbin Du 175229b4e07SChangbin Du 176229b4e07SChangbin DuDevice Initialization Steps 177229b4e07SChangbin Du=========================== 178229b4e07SChangbin Du 179229b4e07SChangbin DuAs noted in the introduction, most PCI drivers need the following steps 180229b4e07SChangbin Dufor device initialization: 181229b4e07SChangbin Du 182229b4e07SChangbin Du - Enable the device 183229b4e07SChangbin Du - Request MMIO/IOP resources 184229b4e07SChangbin Du - Set the DMA mask size (for both coherent and streaming DMA) 185229b4e07SChangbin Du - Allocate and initialize shared control data (pci_allocate_coherent()) 186229b4e07SChangbin Du - Access device configuration space (if needed) 187229b4e07SChangbin Du - Register IRQ handler (request_irq()) 188229b4e07SChangbin Du - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) 189229b4e07SChangbin Du - Enable DMA/processing engines. 190229b4e07SChangbin Du 191229b4e07SChangbin DuThe driver can access PCI config space registers at any time. 192229b4e07SChangbin Du(Well, almost. When running BIST, config space can go away...but 193229b4e07SChangbin Duthat will just result in a PCI Bus Master Abort and config reads 194229b4e07SChangbin Duwill return garbage). 195229b4e07SChangbin Du 196229b4e07SChangbin Du 197229b4e07SChangbin DuEnable the PCI device 198229b4e07SChangbin Du--------------------- 199229b4e07SChangbin DuBefore touching any device registers, the driver needs to enable 200229b4e07SChangbin Duthe PCI device by calling pci_enable_device(). This will: 201229b4e07SChangbin Du 202229b4e07SChangbin Du - wake up the device if it was in suspended state, 203229b4e07SChangbin Du - allocate I/O and memory regions of the device (if BIOS did not), 204229b4e07SChangbin Du - allocate an IRQ (if BIOS did not). 205229b4e07SChangbin Du 206229b4e07SChangbin Du.. note:: 207229b4e07SChangbin Du pci_enable_device() can fail! Check the return value. 208229b4e07SChangbin Du 209229b4e07SChangbin Du.. warning:: 210229b4e07SChangbin Du OS BUG: we don't check resource allocations before enabling those 211229b4e07SChangbin Du resources. The sequence would make more sense if we called 212229b4e07SChangbin Du pci_request_resources() before calling pci_enable_device(). 213abccb9d9SRandy Dunlap Currently, the device drivers can't detect the bug when two 214229b4e07SChangbin Du devices have been allocated the same range. This is not a common 215229b4e07SChangbin Du problem and unlikely to get fixed soon. 216229b4e07SChangbin Du 217229b4e07SChangbin Du This has been discussed before but not changed as of 2.6.19: 21816bbbc87SBjorn Helgaas https://lore.kernel.org/r/20060302180025.GC28895@flint.arm.linux.org.uk/ 219229b4e07SChangbin Du 220229b4e07SChangbin Du 221229b4e07SChangbin Dupci_set_master() will enable DMA by setting the bus master bit 222229b4e07SChangbin Duin the PCI_COMMAND register. It also fixes the latency timer value if 223229b4e07SChangbin Duit's set to something bogus by the BIOS. pci_clear_master() will 224229b4e07SChangbin Dudisable DMA by clearing the bus master bit. 225229b4e07SChangbin Du 226229b4e07SChangbin DuIf the PCI device can use the PCI Memory-Write-Invalidate transaction, 227229b4e07SChangbin Ducall pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval 228229b4e07SChangbin Duand also ensures that the cache line size register is set correctly. 229229b4e07SChangbin DuCheck the return value of pci_set_mwi() as not all architectures 230229b4e07SChangbin Duor chip-sets may support Memory-Write-Invalidate. Alternatively, 231229b4e07SChangbin Duif Mem-Wr-Inval would be nice to have but is not required, call 232229b4e07SChangbin Dupci_try_set_mwi() to have the system do its best effort at enabling 233229b4e07SChangbin DuMem-Wr-Inval. 234229b4e07SChangbin Du 235229b4e07SChangbin Du 236229b4e07SChangbin DuRequest MMIO/IOP resources 237229b4e07SChangbin Du-------------------------- 238229b4e07SChangbin DuMemory (MMIO), and I/O port addresses should NOT be read directly 239229b4e07SChangbin Dufrom the PCI device config space. Use the values in the pci_dev structure 240229b4e07SChangbin Duas the PCI "bus address" might have been remapped to a "host physical" 241229b4e07SChangbin Duaddress by the arch/chip-set specific kernel support. 242229b4e07SChangbin Du 2437d3d3254SMauro Carvalho ChehabSee Documentation/driver-api/io-mapping.rst for how to access device registers 244229b4e07SChangbin Duor device memory. 245229b4e07SChangbin Du 246229b4e07SChangbin DuThe device driver needs to call pci_request_region() to verify 247229b4e07SChangbin Duno other device is already using the same address resource. 248229b4e07SChangbin DuConversely, drivers should call pci_release_region() AFTER 249229b4e07SChangbin Ducalling pci_disable_device(). 250229b4e07SChangbin DuThe idea is to prevent two devices colliding on the same address range. 251229b4e07SChangbin Du 252229b4e07SChangbin Du.. tip:: 253229b4e07SChangbin Du See OS BUG comment above. Currently (2.6.19), The driver can only 254229b4e07SChangbin Du determine MMIO and IO Port resource availability _after_ calling 255229b4e07SChangbin Du pci_enable_device(). 256229b4e07SChangbin Du 257229b4e07SChangbin DuGeneric flavors of pci_request_region() are request_mem_region() 258229b4e07SChangbin Du(for MMIO ranges) and request_region() (for IO Port ranges). 259229b4e07SChangbin DuUse these for address resources that are not described by "normal" PCI 260229b4e07SChangbin DuBARs. 261229b4e07SChangbin Du 262229b4e07SChangbin DuAlso see pci_request_selected_regions() below. 263229b4e07SChangbin Du 264229b4e07SChangbin Du 265229b4e07SChangbin DuSet the DMA mask size 266229b4e07SChangbin Du--------------------- 267229b4e07SChangbin Du.. note:: 268229b4e07SChangbin Du If anything below doesn't make sense, please refer to 269bffbae6dSMauro Carvalho Chehab Documentation/core-api/dma-api.rst. This section is just a reminder that 270229b4e07SChangbin Du drivers need to indicate DMA capabilities of the device and is not 271229b4e07SChangbin Du an authoritative source for DMA interfaces. 272229b4e07SChangbin Du 273229b4e07SChangbin DuWhile all drivers should explicitly indicate the DMA capability 274229b4e07SChangbin Du(e.g. 32 or 64 bit) of the PCI bus master, devices with more than 275229b4e07SChangbin Du32-bit bus master capability for streaming data need the driver 276*f21949c1SAlex Williamsonto "register" this capability by calling dma_set_mask() with 277229b4e07SChangbin Duappropriate parameters. In general this allows more efficient DMA 278229b4e07SChangbin Duon systems where System RAM exists above 4G _physical_ address. 279229b4e07SChangbin Du 280229b4e07SChangbin DuDrivers for all PCI-X and PCIe compliant devices must call 281*f21949c1SAlex Williamsondma_set_mask() as they are 64-bit DMA devices. 282229b4e07SChangbin Du 283229b4e07SChangbin DuSimilarly, drivers must also "register" this capability if the device 28405b0ebd0SChristoph Hellwigcan directly address "coherent memory" in System RAM above 4G physical 28505b0ebd0SChristoph Hellwigaddress by calling dma_set_coherent_mask(). 286229b4e07SChangbin DuAgain, this includes drivers for all PCI-X and PCIe compliant devices. 287229b4e07SChangbin DuMany 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are 288229b4e07SChangbin Du64-bit DMA capable for payload ("streaming") data but not control 28905b0ebd0SChristoph Hellwig("coherent") data. 290229b4e07SChangbin Du 291229b4e07SChangbin Du 292229b4e07SChangbin DuSetup shared control data 293229b4e07SChangbin Du------------------------- 29405b0ebd0SChristoph HellwigOnce the DMA masks are set, the driver can allocate "coherent" (a.k.a. shared) 295bffbae6dSMauro Carvalho Chehabmemory. See Documentation/core-api/dma-api.rst for a full description of 296229b4e07SChangbin Duthe DMA APIs. This section is just a reminder that it needs to be done 297229b4e07SChangbin Dubefore enabling DMA on the device. 298229b4e07SChangbin Du 299229b4e07SChangbin Du 300229b4e07SChangbin DuInitialize device registers 301229b4e07SChangbin Du--------------------------- 302229b4e07SChangbin DuSome drivers will need specific "capability" fields programmed 303229b4e07SChangbin Duor other "vendor specific" register initialized or reset. 304229b4e07SChangbin DuE.g. clearing pending interrupts. 305229b4e07SChangbin Du 306229b4e07SChangbin Du 307229b4e07SChangbin DuRegister IRQ handler 308229b4e07SChangbin Du-------------------- 309229b4e07SChangbin DuWhile calling request_irq() is the last step described here, 310229b4e07SChangbin Duthis is often just another intermediate step to initialize a device. 311229b4e07SChangbin DuThis step can often be deferred until the device is opened for use. 312229b4e07SChangbin Du 313229b4e07SChangbin DuAll interrupt handlers for IRQ lines should be registered with IRQF_SHARED 314229b4e07SChangbin Duand use the devid to map IRQs to devices (remember that all PCI IRQ lines 315229b4e07SChangbin Ducan be shared). 316229b4e07SChangbin Du 317229b4e07SChangbin Durequest_irq() will associate an interrupt handler and device handle 318229b4e07SChangbin Duwith an interrupt number. Historically interrupt numbers represent 319229b4e07SChangbin DuIRQ lines which run from the PCI device to the Interrupt controller. 320229b4e07SChangbin DuWith MSI and MSI-X (more below) the interrupt number is a CPU "vector". 321229b4e07SChangbin Du 322229b4e07SChangbin Durequest_irq() also enables the interrupt. Make sure the device is 323229b4e07SChangbin Duquiesced and does not have any interrupts pending before registering 324229b4e07SChangbin Duthe interrupt handler. 325229b4e07SChangbin Du 326229b4e07SChangbin DuMSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts" 327229b4e07SChangbin Duwhich deliver interrupts to the CPU via a DMA write to a Local APIC. 328229b4e07SChangbin DuThe fundamental difference between MSI and MSI-X is how multiple 329229b4e07SChangbin Du"vectors" get allocated. MSI requires contiguous blocks of vectors 330229b4e07SChangbin Duwhile MSI-X can allocate several individual ones. 331229b4e07SChangbin Du 332229b4e07SChangbin DuMSI capability can be enabled by calling pci_alloc_irq_vectors() with the 333229b4e07SChangbin DuPCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This 334229b4e07SChangbin Ducauses the PCI support to program CPU vector data into the PCI device 335229b4e07SChangbin Ducapability registers. Many architectures, chip-sets, or BIOSes do NOT 336229b4e07SChangbin Dusupport MSI or MSI-X and a call to pci_alloc_irq_vectors with just 337229b4e07SChangbin Duthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always 338229b4e07SChangbin Duspecify PCI_IRQ_LEGACY as well. 339229b4e07SChangbin Du 340229b4e07SChangbin DuDrivers that have different interrupt handlers for MSI/MSI-X and 341229b4e07SChangbin Dulegacy INTx should chose the right one based on the msi_enabled 342229b4e07SChangbin Duand msix_enabled flags in the pci_dev structure after calling 343229b4e07SChangbin Dupci_alloc_irq_vectors. 344229b4e07SChangbin Du 345229b4e07SChangbin DuThere are (at least) two really good reasons for using MSI: 346229b4e07SChangbin Du 347229b4e07SChangbin Du1) MSI is an exclusive interrupt vector by definition. 348229b4e07SChangbin Du This means the interrupt handler doesn't have to verify 349229b4e07SChangbin Du its device caused the interrupt. 350229b4e07SChangbin Du 351229b4e07SChangbin Du2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed 352229b4e07SChangbin Du to be visible to the host CPU(s) when the MSI is delivered. This 353229b4e07SChangbin Du is important for both data coherency and avoiding stale control data. 354229b4e07SChangbin Du This guarantee allows the driver to omit MMIO reads to flush 355229b4e07SChangbin Du the DMA stream. 356229b4e07SChangbin Du 357229b4e07SChangbin DuSee drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples 358229b4e07SChangbin Duof MSI/MSI-X usage. 359229b4e07SChangbin Du 360229b4e07SChangbin Du 361229b4e07SChangbin DuPCI device shutdown 362229b4e07SChangbin Du=================== 363229b4e07SChangbin Du 364229b4e07SChangbin DuWhen a PCI device driver is being unloaded, most of the following 365229b4e07SChangbin Dusteps need to be performed: 366229b4e07SChangbin Du 367229b4e07SChangbin Du - Disable the device from generating IRQs 368229b4e07SChangbin Du - Release the IRQ (free_irq()) 369229b4e07SChangbin Du - Stop all DMA activity 37005b0ebd0SChristoph Hellwig - Release DMA buffers (both streaming and coherent) 371229b4e07SChangbin Du - Unregister from other subsystems (e.g. scsi or netdev) 372229b4e07SChangbin Du - Disable device from responding to MMIO/IO Port addresses 373229b4e07SChangbin Du - Release MMIO/IO Port resource(s) 374229b4e07SChangbin Du 375229b4e07SChangbin Du 376229b4e07SChangbin DuStop IRQs on the device 377229b4e07SChangbin Du----------------------- 378229b4e07SChangbin DuHow to do this is chip/device specific. If it's not done, it opens 379229b4e07SChangbin Duthe possibility of a "screaming interrupt" if (and only if) 380229b4e07SChangbin Duthe IRQ is shared with another device. 381229b4e07SChangbin Du 382229b4e07SChangbin DuWhen the shared IRQ handler is "unhooked", the remaining devices 383229b4e07SChangbin Duusing the same IRQ line will still need the IRQ enabled. Thus if the 384229b4e07SChangbin Du"unhooked" device asserts IRQ line, the system will respond assuming 385229b4e07SChangbin Duit was one of the remaining devices asserted the IRQ line. Since none 386229b4e07SChangbin Duof the other devices will handle the IRQ, the system will "hang" until 387229b4e07SChangbin Duit decides the IRQ isn't going to get handled and masks the IRQ (100,000 388229b4e07SChangbin Duiterations later). Once the shared IRQ is masked, the remaining devices 389229b4e07SChangbin Duwill stop functioning properly. Not a nice situation. 390229b4e07SChangbin Du 391229b4e07SChangbin DuThis is another reason to use MSI or MSI-X if it's available. 392229b4e07SChangbin DuMSI and MSI-X are defined to be exclusive interrupts and thus 393229b4e07SChangbin Duare not susceptible to the "screaming interrupt" problem. 394229b4e07SChangbin Du 395229b4e07SChangbin Du 396229b4e07SChangbin DuRelease the IRQ 397229b4e07SChangbin Du--------------- 398229b4e07SChangbin DuOnce the device is quiesced (no more IRQs), one can call free_irq(). 399229b4e07SChangbin DuThis function will return control once any pending IRQs are handled, 400229b4e07SChangbin Du"unhook" the drivers IRQ handler from that IRQ, and finally release 401229b4e07SChangbin Duthe IRQ if no one else is using it. 402229b4e07SChangbin Du 403229b4e07SChangbin Du 404229b4e07SChangbin DuStop all DMA activity 405229b4e07SChangbin Du--------------------- 406229b4e07SChangbin DuIt's extremely important to stop all DMA operations BEFORE attempting 407229b4e07SChangbin Duto deallocate DMA control data. Failure to do so can result in memory 408229b4e07SChangbin Ducorruption, hangs, and on some chip-sets a hard crash. 409229b4e07SChangbin Du 410229b4e07SChangbin DuStopping DMA after stopping the IRQs can avoid races where the 411229b4e07SChangbin DuIRQ handler might restart DMA engines. 412229b4e07SChangbin Du 413229b4e07SChangbin DuWhile this step sounds obvious and trivial, several "mature" drivers 414229b4e07SChangbin Dudidn't get this step right in the past. 415229b4e07SChangbin Du 416229b4e07SChangbin Du 417229b4e07SChangbin DuRelease DMA buffers 418229b4e07SChangbin Du------------------- 419229b4e07SChangbin DuOnce DMA is stopped, clean up streaming DMA first. 420229b4e07SChangbin DuI.e. unmap data buffers and return buffers to "upstream" 421229b4e07SChangbin Duowners if there is one. 422229b4e07SChangbin Du 42305b0ebd0SChristoph HellwigThen clean up "coherent" buffers which contain the control data. 424229b4e07SChangbin Du 425bffbae6dSMauro Carvalho ChehabSee Documentation/core-api/dma-api.rst for details on unmapping interfaces. 426229b4e07SChangbin Du 427229b4e07SChangbin Du 428229b4e07SChangbin DuUnregister from other subsystems 429229b4e07SChangbin Du-------------------------------- 430229b4e07SChangbin DuMost low level PCI device drivers support some other subsystem 431229b4e07SChangbin Dulike USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your 432229b4e07SChangbin Dudriver isn't losing resources from that other subsystem. 433229b4e07SChangbin DuIf this happens, typically the symptom is an Oops (panic) when 434229b4e07SChangbin Duthe subsystem attempts to call into a driver that has been unloaded. 435229b4e07SChangbin Du 436229b4e07SChangbin Du 437229b4e07SChangbin DuDisable Device from responding to MMIO/IO Port addresses 438229b4e07SChangbin Du-------------------------------------------------------- 439229b4e07SChangbin Duio_unmap() MMIO or IO Port resources and then call pci_disable_device(). 440229b4e07SChangbin DuThis is the symmetric opposite of pci_enable_device(). 441229b4e07SChangbin DuDo not access device registers after calling pci_disable_device(). 442229b4e07SChangbin Du 443229b4e07SChangbin Du 444229b4e07SChangbin DuRelease MMIO/IO Port Resource(s) 445229b4e07SChangbin Du-------------------------------- 446229b4e07SChangbin DuCall pci_release_region() to mark the MMIO or IO Port range as available. 447229b4e07SChangbin DuFailure to do so usually results in the inability to reload the driver. 448229b4e07SChangbin Du 449229b4e07SChangbin Du 450229b4e07SChangbin DuHow to access PCI config space 451229b4e07SChangbin Du============================== 452229b4e07SChangbin Du 453229b4e07SChangbin DuYou can use `pci_(read|write)_config_(byte|word|dword)` to access the config 454229b4e07SChangbin Duspace of a device represented by `struct pci_dev *`. All these functions return 455229b4e07SChangbin Du0 when successful or an error code (`PCIBIOS_...`) which can be translated to a 456229b4e07SChangbin Dutext string by pcibios_strerror. Most drivers expect that accesses to valid PCI 457229b4e07SChangbin Dudevices don't fail. 458229b4e07SChangbin Du 459229b4e07SChangbin DuIf you don't have a struct pci_dev available, you can call 460229b4e07SChangbin Du`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device 461229b4e07SChangbin Duand function on that bus. 462229b4e07SChangbin Du 463229b4e07SChangbin DuIf you access fields in the standard portion of the config header, please 464229b4e07SChangbin Duuse symbolic names of locations and bits declared in <linux/pci.h>. 465229b4e07SChangbin Du 466229b4e07SChangbin DuIf you need to access Extended PCI Capability registers, just call 467229b4e07SChangbin Dupci_find_capability() for the particular capability and it will find the 468229b4e07SChangbin Ducorresponding register block for you. 469229b4e07SChangbin Du 470229b4e07SChangbin Du 471229b4e07SChangbin DuOther interesting functions 472229b4e07SChangbin Du=========================== 473229b4e07SChangbin Du 474229b4e07SChangbin Du============================= ================================================ 475229b4e07SChangbin Dupci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain, 476229b4e07SChangbin Du bus and slot and number. If the device is 477229b4e07SChangbin Du found, its reference count is increased. 478229b4e07SChangbin Dupci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3) 479229b4e07SChangbin Dupci_find_capability() Find specified capability in device's capability 480229b4e07SChangbin Du list. 481229b4e07SChangbin Dupci_resource_start() Returns bus start address for a given PCI region 482229b4e07SChangbin Dupci_resource_end() Returns bus end address for a given PCI region 483229b4e07SChangbin Dupci_resource_len() Returns the byte length of a PCI region 484229b4e07SChangbin Dupci_set_drvdata() Set private driver data pointer for a pci_dev 485229b4e07SChangbin Dupci_get_drvdata() Return private driver data pointer for a pci_dev 486229b4e07SChangbin Dupci_set_mwi() Enable Memory-Write-Invalidate transactions. 487229b4e07SChangbin Dupci_clear_mwi() Disable Memory-Write-Invalidate transactions. 488229b4e07SChangbin Du============================= ================================================ 489229b4e07SChangbin Du 490229b4e07SChangbin Du 491229b4e07SChangbin DuMiscellaneous hints 492229b4e07SChangbin Du=================== 493229b4e07SChangbin Du 494229b4e07SChangbin DuWhen displaying PCI device names to the user (for example when a driver wants 495229b4e07SChangbin Duto tell the user what card has it found), please use pci_name(pci_dev). 496229b4e07SChangbin Du 497229b4e07SChangbin DuAlways refer to the PCI devices by a pointer to the pci_dev structure. 498229b4e07SChangbin DuAll PCI layer functions use this identification and it's the only 499229b4e07SChangbin Dureasonable one. Don't use bus/slot/function numbers except for very 500229b4e07SChangbin Duspecial purposes -- on systems with multiple primary buses their semantics 501229b4e07SChangbin Ducan be pretty complex. 502229b4e07SChangbin Du 503229b4e07SChangbin DuDon't try to turn on Fast Back to Back writes in your driver. All devices 504229b4e07SChangbin Duon the bus need to be capable of doing it, so this is something which needs 505229b4e07SChangbin Duto be handled by platform and generic code, not individual drivers. 506229b4e07SChangbin Du 507229b4e07SChangbin Du 508229b4e07SChangbin DuVendor and device identifications 509229b4e07SChangbin Du================================= 510229b4e07SChangbin Du 511229b4e07SChangbin DuDo not add new device or vendor IDs to include/linux/pci_ids.h unless they 512229b4e07SChangbin Duare shared across multiple drivers. You can add private definitions in 513229b4e07SChangbin Duyour driver if they're helpful, or just use plain hex constants. 514229b4e07SChangbin Du 515229b4e07SChangbin DuThe device IDs are arbitrary hex numbers (vendor controlled) and normally used 516229b4e07SChangbin Duonly in a single location, the pci_device_id table. 517229b4e07SChangbin Du 5187ecd4a81SAlexander A. KlimovPlease DO submit new vendor/device IDs to https://pci-ids.ucw.cz/. 5197ecd4a81SAlexander A. KlimovThere's a mirror of the pci.ids file at https://github.com/pciutils/pciids. 520229b4e07SChangbin Du 521229b4e07SChangbin Du 522229b4e07SChangbin DuObsolete functions 523229b4e07SChangbin Du================== 524229b4e07SChangbin Du 525229b4e07SChangbin DuThere are several functions which you might come across when trying to 526229b4e07SChangbin Duport an old driver to the new PCI interface. They are no longer present 527229b4e07SChangbin Duin the kernel as they aren't compatible with hotplug or PCI domains or 528229b4e07SChangbin Duhaving sane locking. 529229b4e07SChangbin Du 530229b4e07SChangbin Du================= =========================================== 531229b4e07SChangbin Dupci_find_device() Superseded by pci_get_device() 532229b4e07SChangbin Dupci_find_subsys() Superseded by pci_get_subsys() 533229b4e07SChangbin Dupci_find_slot() Superseded by pci_get_domain_bus_and_slot() 534229b4e07SChangbin Dupci_get_slot() Superseded by pci_get_domain_bus_and_slot() 535229b4e07SChangbin Du================= =========================================== 536229b4e07SChangbin Du 537229b4e07SChangbin DuThe alternative is the traditional PCI device driver that walks PCI 538229b4e07SChangbin Dudevice lists. This is still possible but discouraged. 539229b4e07SChangbin Du 540229b4e07SChangbin Du 541229b4e07SChangbin DuMMIO Space and "Write Posting" 542229b4e07SChangbin Du============================== 543229b4e07SChangbin Du 544229b4e07SChangbin DuConverting a driver from using I/O Port space to using MMIO space 545229b4e07SChangbin Duoften requires some additional changes. Specifically, "write posting" 546229b4e07SChangbin Duneeds to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2) 547229b4e07SChangbin Dualready do this. I/O Port space guarantees write transactions reach the PCI 548229b4e07SChangbin Dudevice before the CPU can continue. Writes to MMIO space allow the CPU 549229b4e07SChangbin Duto continue before the transaction reaches the PCI device. HW weenies 550229b4e07SChangbin Ducall this "Write Posting" because the write completion is "posted" to 551229b4e07SChangbin Duthe CPU before the transaction has reached its destination. 552229b4e07SChangbin Du 553229b4e07SChangbin DuThus, timing sensitive code should add readl() where the CPU is 554229b4e07SChangbin Duexpected to wait before doing other work. The classic "bit banging" 555229b4e07SChangbin Dusequence works fine for I/O Port space:: 556229b4e07SChangbin Du 557229b4e07SChangbin Du for (i = 8; --i; val >>= 1) { 558229b4e07SChangbin Du outb(val & 1, ioport_reg); /* write bit */ 559229b4e07SChangbin Du udelay(10); 560229b4e07SChangbin Du } 561229b4e07SChangbin Du 562229b4e07SChangbin DuThe same sequence for MMIO space should be:: 563229b4e07SChangbin Du 564229b4e07SChangbin Du for (i = 8; --i; val >>= 1) { 565229b4e07SChangbin Du writeb(val & 1, mmio_reg); /* write bit */ 566229b4e07SChangbin Du readb(safe_mmio_reg); /* flush posted write */ 567229b4e07SChangbin Du udelay(10); 568229b4e07SChangbin Du } 569229b4e07SChangbin Du 570229b4e07SChangbin DuIt is important that "safe_mmio_reg" not have any side effects that 571229b4e07SChangbin Duinterferes with the correct operation of the device. 572229b4e07SChangbin Du 573229b4e07SChangbin DuAnother case to watch out for is when resetting a PCI device. Use PCI 574229b4e07SChangbin DuConfiguration space reads to flush the writel(). This will gracefully 575229b4e07SChangbin Duhandle the PCI master abort on all platforms if the PCI device is 576229b4e07SChangbin Duexpected to not respond to a readl(). Most x86 platforms will allow 577229b4e07SChangbin DuMMIO reads to master abort (a.k.a. "Soft Fail") and return garbage 578229b4e07SChangbin Du(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail"). 579