18a8a602fSJonathan Corbet.. Copyright 2001 Matthew Wilcox
28a8a602fSJonathan Corbet..
38a8a602fSJonathan Corbet..     This documentation is free software; you can redistribute
48a8a602fSJonathan Corbet..     it and/or modify it under the terms of the GNU General Public
58a8a602fSJonathan Corbet..     License as published by the Free Software Foundation; either
68a8a602fSJonathan Corbet..     version 2 of the License, or (at your option) any later
78a8a602fSJonathan Corbet..     version.
88a8a602fSJonathan Corbet
98a8a602fSJonathan Corbet===============================
108a8a602fSJonathan CorbetBus-Independent Device Accesses
118a8a602fSJonathan Corbet===============================
128a8a602fSJonathan Corbet
138a8a602fSJonathan Corbet:Author: Matthew Wilcox
148a8a602fSJonathan Corbet:Author: Alan Cox
158a8a602fSJonathan Corbet
168a8a602fSJonathan CorbetIntroduction
178a8a602fSJonathan Corbet============
188a8a602fSJonathan Corbet
198a8a602fSJonathan CorbetLinux provides an API which abstracts performing IO across all busses
208a8a602fSJonathan Corbetand devices, allowing device drivers to be written independently of bus
218a8a602fSJonathan Corbettype.
228a8a602fSJonathan Corbet
238a8a602fSJonathan CorbetMemory Mapped IO
248a8a602fSJonathan Corbet================
258a8a602fSJonathan Corbet
268a8a602fSJonathan CorbetGetting Access to the Device
278a8a602fSJonathan Corbet----------------------------
288a8a602fSJonathan Corbet
298a8a602fSJonathan CorbetThe most widely supported form of IO is memory mapped IO. That is, a
308a8a602fSJonathan Corbetpart of the CPU's address space is interpreted not as accesses to
318a8a602fSJonathan Corbetmemory, but as accesses to a device. Some architectures define devices
328a8a602fSJonathan Corbetto be at a fixed address, but most have some method of discovering
338a8a602fSJonathan Corbetdevices. The PCI bus walk is a good example of such a scheme. This
348a8a602fSJonathan Corbetdocument does not cover how to receive such an address, but assumes you
358a8a602fSJonathan Corbetare starting with one. Physical addresses are of type unsigned long.
368a8a602fSJonathan Corbet
378a8a602fSJonathan CorbetThis address should not be used directly. Instead, to get an address
388a8a602fSJonathan Corbetsuitable for passing to the accessor functions described below, you
393a63e9fcSPuranjay Mohanshould call ioremap(). An address suitable for accessing
408a8a602fSJonathan Corbetthe device will be returned to you.
418a8a602fSJonathan Corbet
428a8a602fSJonathan CorbetAfter you've finished using the device (say, in your module's exit
433a63e9fcSPuranjay Mohanroutine), call iounmap() in order to return the address
448a8a602fSJonathan Corbetspace to the kernel. Most architectures allocate new address space each
453a63e9fcSPuranjay Mohantime you call ioremap(), and they can run out unless you
463a63e9fcSPuranjay Mohancall iounmap().
478a8a602fSJonathan Corbet
488a8a602fSJonathan CorbetAccessing the device
498a8a602fSJonathan Corbet--------------------
508a8a602fSJonathan Corbet
518a8a602fSJonathan CorbetThe part of the interface most used by drivers is reading and writing
528a8a602fSJonathan Corbetmemory-mapped registers on the device. Linux provides interfaces to read
538a8a602fSJonathan Corbetand write 8-bit, 16-bit, 32-bit and 64-bit quantities. Due to a
548a8a602fSJonathan Corbethistorical accident, these are named byte, word, long and quad accesses.
558a8a602fSJonathan CorbetBoth read and write accesses are supported; there is no prefetch support
568a8a602fSJonathan Corbetat this time.
578a8a602fSJonathan Corbet
588a8a602fSJonathan CorbetThe functions are named readb(), readw(), readl(), readq(),
598a8a602fSJonathan Corbetreadb_relaxed(), readw_relaxed(), readl_relaxed(), readq_relaxed(),
608a8a602fSJonathan Corbetwriteb(), writew(), writel() and writeq().
618a8a602fSJonathan Corbet
628a8a602fSJonathan CorbetSome devices (such as framebuffers) would like to use larger transfers than
633a63e9fcSPuranjay Mohan8 bytes at a time. For these devices, the memcpy_toio(),
643a63e9fcSPuranjay Mohanmemcpy_fromio() and memset_io() functions are
658a8a602fSJonathan Corbetprovided. Do not use memset or memcpy on IO addresses; they are not
668a8a602fSJonathan Corbetguaranteed to copy data in order.
678a8a602fSJonathan Corbet
688a8a602fSJonathan CorbetThe read and write functions are defined to be ordered. That is the
698a8a602fSJonathan Corbetcompiler is not permitted to reorder the I/O sequence. When the ordering
708a8a602fSJonathan Corbetcan be compiler optimised, you can use __readb() and friends to
718a8a602fSJonathan Corbetindicate the relaxed ordering. Use this with care.
728a8a602fSJonathan Corbet
738a8a602fSJonathan CorbetWhile the basic functions are defined to be synchronous with respect to
748a8a602fSJonathan Corbeteach other and ordered with respect to each other the busses the devices
758a8a602fSJonathan Corbetsit on may themselves have asynchronicity. In particular many authors
768a8a602fSJonathan Corbetare burned by the fact that PCI bus writes are posted asynchronously. A
778a8a602fSJonathan Corbetdriver author must issue a read from the same device to ensure that
788a8a602fSJonathan Corbetwrites have occurred in the specific cases the author cares. This kind
798a8a602fSJonathan Corbetof property cannot be hidden from driver writers in the API. In some
808a8a602fSJonathan Corbetcases, the read used to flush the device may be expected to fail (if the
818a8a602fSJonathan Corbetcard is resetting, for example). In that case, the read should be done
828a8a602fSJonathan Corbetfrom config space, which is guaranteed to soft-fail if the card doesn't
838a8a602fSJonathan Corbetrespond.
848a8a602fSJonathan Corbet
858a8a602fSJonathan CorbetThe following is an example of flushing a write to a device when the
868a8a602fSJonathan Corbetdriver would like to ensure the write's effects are visible prior to
878a8a602fSJonathan Corbetcontinuing execution::
888a8a602fSJonathan Corbet
898a8a602fSJonathan Corbet    static inline void
908a8a602fSJonathan Corbet    qla1280_disable_intrs(struct scsi_qla_host *ha)
918a8a602fSJonathan Corbet    {
928a8a602fSJonathan Corbet        struct device_reg *reg;
938a8a602fSJonathan Corbet
948a8a602fSJonathan Corbet        reg = ha->iobase;
958a8a602fSJonathan Corbet        /* disable risc and host interrupts */
968a8a602fSJonathan Corbet        WRT_REG_WORD(&reg->ictrl, 0);
978a8a602fSJonathan Corbet        /*
988a8a602fSJonathan Corbet         * The following read will ensure that the above write
998a8a602fSJonathan Corbet         * has been received by the device before we return from this
1008a8a602fSJonathan Corbet         * function.
1018a8a602fSJonathan Corbet         */
1028a8a602fSJonathan Corbet        RD_REG_WORD(&reg->ictrl);
1038a8a602fSJonathan Corbet        ha->flags.ints_enabled = 0;
1048a8a602fSJonathan Corbet    }
1058a8a602fSJonathan Corbet
1068a8a602fSJonathan CorbetPCI ordering rules also guarantee that PIO read responses arrive after any
1078a8a602fSJonathan Corbetoutstanding DMA writes from that bus, since for some devices the result of
1088a8a602fSJonathan Corbeta readb() call may signal to the driver that a DMA transaction is
1098a8a602fSJonathan Corbetcomplete. In many cases, however, the driver may want to indicate that the
1108a8a602fSJonathan Corbetnext readb() call has no relation to any previous DMA writes
1118a8a602fSJonathan Corbetperformed by the device. The driver can use readb_relaxed() for
1128a8a602fSJonathan Corbetthese cases, although only some platforms will honor the relaxed
1138a8a602fSJonathan Corbetsemantics. Using the relaxed read functions will provide significant
1148a8a602fSJonathan Corbetperformance benefits on platforms that support it. The qla2xxx driver
1158a8a602fSJonathan Corbetprovides examples of how to use readX_relaxed(). In many cases, a majority
1168a8a602fSJonathan Corbetof the driver's readX() calls can safely be converted to readX_relaxed()
1178a8a602fSJonathan Corbetcalls, since only a few will indicate or depend on DMA completion.
1188a8a602fSJonathan Corbet
1198a8a602fSJonathan CorbetPort Space Accesses
1208a8a602fSJonathan Corbet===================
1218a8a602fSJonathan Corbet
1228a8a602fSJonathan CorbetPort Space Explained
1238a8a602fSJonathan Corbet--------------------
1248a8a602fSJonathan Corbet
1258a8a602fSJonathan CorbetAnother form of IO commonly supported is Port Space. This is a range of
1268a8a602fSJonathan Corbetaddresses separate to the normal memory address space. Access to these
1278a8a602fSJonathan Corbetaddresses is generally not as fast as accesses to the memory mapped
1288a8a602fSJonathan Corbetaddresses, and it also has a potentially smaller address space.
1298a8a602fSJonathan Corbet
1308a8a602fSJonathan CorbetUnlike memory mapped IO, no preparation is required to access port
1318a8a602fSJonathan Corbetspace.
1328a8a602fSJonathan Corbet
1338a8a602fSJonathan CorbetAccessing Port Space
1348a8a602fSJonathan Corbet--------------------
1358a8a602fSJonathan Corbet
1368a8a602fSJonathan CorbetAccesses to this space are provided through a set of functions which
1378a8a602fSJonathan Corbetallow 8-bit, 16-bit and 32-bit accesses; also known as byte, word and
1383a63e9fcSPuranjay Mohanlong. These functions are inb(), inw(),
1393a63e9fcSPuranjay Mohaninl(), outb(), outw() and
1403a63e9fcSPuranjay Mohanoutl().
1418a8a602fSJonathan Corbet
1428a8a602fSJonathan CorbetSome variants are provided for these functions. Some devices require
1438a8a602fSJonathan Corbetthat accesses to their ports are slowed down. This functionality is
1448a8a602fSJonathan Corbetprovided by appending a ``_p`` to the end of the function.
1453a63e9fcSPuranjay MohanThere are also equivalents to memcpy. The ins() and
1463a63e9fcSPuranjay Mohanouts() functions copy bytes, words or longs to the given
1478a8a602fSJonathan Corbetport.
1488a8a602fSJonathan Corbet
149eeba4b01SArnd Bergmann__iomem pointer tokens
150eeba4b01SArnd Bergmann======================
151eeba4b01SArnd Bergmann
152eeba4b01SArnd BergmannThe data type for an MMIO address is an ``__iomem`` qualified pointer, such as
153eeba4b01SArnd Bergmann``void __iomem *reg``. On most architectures it is a regular pointer that
154eeba4b01SArnd Bergmannpoints to a virtual memory address and can be offset or dereferenced, but in
155eeba4b01SArnd Bergmannportable code, it must only be passed from and to functions that explicitly
156eeba4b01SArnd Bergmannoperated on an ``__iomem`` token, in particular the ioremap() and
157eeba4b01SArnd Bergmannreadl()/writel() functions. The 'sparse' semantic code checker can be used to
158eeba4b01SArnd Bergmannverify that this is done correctly.
159eeba4b01SArnd Bergmann
160eeba4b01SArnd BergmannWhile on most architectures, ioremap() creates a page table entry for an
161eeba4b01SArnd Bergmannuncached virtual address pointing to the physical MMIO address, some
162eeba4b01SArnd Bergmannarchitectures require special instructions for MMIO, and the ``__iomem`` pointer
163eeba4b01SArnd Bergmannjust encodes the physical address or an offsettable cookie that is interpreted
164eeba4b01SArnd Bergmannby readl()/writel().
165eeba4b01SArnd Bergmann
166eeba4b01SArnd BergmannDifferences between I/O access functions
167eeba4b01SArnd Bergmann========================================
168eeba4b01SArnd Bergmann
169eeba4b01SArnd Bergmannreadq(), readl(), readw(), readb(), writeq(), writel(), writew(), writeb()
170eeba4b01SArnd Bergmann
171eeba4b01SArnd Bergmann  These are the most generic accessors, providing serialization against other
172eeba4b01SArnd Bergmann  MMIO accesses and DMA accesses as well as fixed endianness for accessing
173eeba4b01SArnd Bergmann  little-endian PCI devices and on-chip peripherals. Portable device drivers
174eeba4b01SArnd Bergmann  should generally use these for any access to ``__iomem`` pointers.
175eeba4b01SArnd Bergmann
176eeba4b01SArnd Bergmann  Note that posted writes are not strictly ordered against a spinlock, see
177eeba4b01SArnd Bergmann  Documentation/driver-api/io_ordering.rst.
178eeba4b01SArnd Bergmann
179eeba4b01SArnd Bergmannreadq_relaxed(), readl_relaxed(), readw_relaxed(), readb_relaxed(),
180eeba4b01SArnd Bergmannwriteq_relaxed(), writel_relaxed(), writew_relaxed(), writeb_relaxed()
181eeba4b01SArnd Bergmann
182eeba4b01SArnd Bergmann  On architectures that require an expensive barrier for serializing against
183eeba4b01SArnd Bergmann  DMA, these "relaxed" versions of the MMIO accessors only serialize against
184eeba4b01SArnd Bergmann  each other, but contain a less expensive barrier operation. A device driver
185eeba4b01SArnd Bergmann  might use these in a particularly performance sensitive fast path, with a
186eeba4b01SArnd Bergmann  comment that explains why the usage in a specific location is safe without
187eeba4b01SArnd Bergmann  the extra barriers.
188eeba4b01SArnd Bergmann
189eeba4b01SArnd Bergmann  See memory-barriers.txt for a more detailed discussion on the precise ordering
190eeba4b01SArnd Bergmann  guarantees of the non-relaxed and relaxed versions.
191eeba4b01SArnd Bergmann
192eeba4b01SArnd Bergmannioread64(), ioread32(), ioread16(), ioread8(),
193eeba4b01SArnd Bergmanniowrite64(), iowrite32(), iowrite16(), iowrite8()
194eeba4b01SArnd Bergmann
195eeba4b01SArnd Bergmann  These are an alternative to the normal readl()/writel() functions, with almost
196eeba4b01SArnd Bergmann  identical behavior, but they can also operate on ``__iomem`` tokens returned
197eeba4b01SArnd Bergmann  for mapping PCI I/O space with pci_iomap() or ioport_map(). On architectures
198eeba4b01SArnd Bergmann  that require special instructions for I/O port access, this adds a small
199eeba4b01SArnd Bergmann  overhead for an indirect function call implemented in lib/iomap.c, while on
200eeba4b01SArnd Bergmann  other architectures, these are simply aliases.
201eeba4b01SArnd Bergmann
202eeba4b01SArnd Bergmannioread64be(), ioread32be(), ioread16be()
203eeba4b01SArnd Bergmanniowrite64be(), iowrite32be(), iowrite16be()
204eeba4b01SArnd Bergmann
205eeba4b01SArnd Bergmann  These behave in the same way as the ioread32()/iowrite32() family, but with
206eeba4b01SArnd Bergmann  reversed byte order, for accessing devices with big-endian MMIO registers.
207eeba4b01SArnd Bergmann  Device drivers that can operate on either big-endian or little-endian
208eeba4b01SArnd Bergmann  registers may have to implement a custom wrapper function that picks one or
209eeba4b01SArnd Bergmann  the other depending on which device was found.
210eeba4b01SArnd Bergmann
211eeba4b01SArnd Bergmann  Note: On some architectures, the normal readl()/writel() functions
212eeba4b01SArnd Bergmann  traditionally assume that devices are the same endianness as the CPU, while
213eeba4b01SArnd Bergmann  using a hardware byte-reverse on the PCI bus when running a big-endian kernel.
214eeba4b01SArnd Bergmann  Drivers that use readl()/writel() this way are generally not portable, but
215eeba4b01SArnd Bergmann  tend to be limited to a particular SoC.
216eeba4b01SArnd Bergmann
217eeba4b01SArnd Bergmannhi_lo_readq(), lo_hi_readq(), hi_lo_readq_relaxed(), lo_hi_readq_relaxed(),
218eeba4b01SArnd Bergmannioread64_lo_hi(), ioread64_hi_lo(), ioread64be_lo_hi(), ioread64be_hi_lo(),
219eeba4b01SArnd Bergmannhi_lo_writeq(), lo_hi_writeq(), hi_lo_writeq_relaxed(), lo_hi_writeq_relaxed(),
220eeba4b01SArnd Bergmanniowrite64_lo_hi(), iowrite64_hi_lo(), iowrite64be_lo_hi(), iowrite64be_hi_lo()
221eeba4b01SArnd Bergmann
222eeba4b01SArnd Bergmann  Some device drivers have 64-bit registers that cannot be accessed atomically
223eeba4b01SArnd Bergmann  on 32-bit architectures but allow two consecutive 32-bit accesses instead.
224eeba4b01SArnd Bergmann  Since it depends on the particular device which of the two halves has to be
225eeba4b01SArnd Bergmann  accessed first, a helper is provided for each combination of 64-bit accessors
226eeba4b01SArnd Bergmann  with either low/high or high/low word ordering. A device driver must include
227eeba4b01SArnd Bergmann  either <linux/io-64-nonatomic-lo-hi.h> or <linux/io-64-nonatomic-hi-lo.h> to
228eeba4b01SArnd Bergmann  get the function definitions along with helpers that redirect the normal
229eeba4b01SArnd Bergmann  readq()/writeq() to them on architectures that do not provide 64-bit access
230eeba4b01SArnd Bergmann  natively.
231eeba4b01SArnd Bergmann
232eeba4b01SArnd Bergmann__raw_readq(), __raw_readl(), __raw_readw(), __raw_readb(),
233eeba4b01SArnd Bergmann__raw_writeq(), __raw_writel(), __raw_writew(), __raw_writeb()
234eeba4b01SArnd Bergmann
235eeba4b01SArnd Bergmann  These are low-level MMIO accessors without barriers or byteorder changes and
236eeba4b01SArnd Bergmann  architecture specific behavior. Accesses are usually atomic in the sense that
237eeba4b01SArnd Bergmann  a four-byte __raw_readl() does not get split into individual byte loads, but
238eeba4b01SArnd Bergmann  multiple consecutive accesses can be combined on the bus. In portable code, it
239eeba4b01SArnd Bergmann  is only safe to use these to access memory behind a device bus but not MMIO
240eeba4b01SArnd Bergmann  registers, as there are no ordering guarantees with regard to other MMIO
241eeba4b01SArnd Bergmann  accesses or even spinlocks. The byte order is generally the same as for normal
242eeba4b01SArnd Bergmann  memory, so unlike the other functions, these can be used to copy data between
243eeba4b01SArnd Bergmann  kernel memory and device memory.
244eeba4b01SArnd Bergmann
245eeba4b01SArnd Bergmanninl(), inw(), inb(), outl(), outw(), outb()
246eeba4b01SArnd Bergmann
247eeba4b01SArnd Bergmann  PCI I/O port resources traditionally require separate helpers as they are
248eeba4b01SArnd Bergmann  implemented using special instructions on the x86 architecture. On most other
249eeba4b01SArnd Bergmann  architectures, these are mapped to readl()/writel() style accessors
250eeba4b01SArnd Bergmann  internally, usually pointing to a fixed area in virtual memory. Instead of an
251eeba4b01SArnd Bergmann  ``__iomem`` pointer, the address is a 32-bit integer token to identify a port
252eeba4b01SArnd Bergmann  number. PCI requires I/O port access to be non-posted, meaning that an outb()
253eeba4b01SArnd Bergmann  must complete before the following code executes, while a normal writeb() may
254eeba4b01SArnd Bergmann  still be in progress. On architectures that correctly implement this, I/O port
255eeba4b01SArnd Bergmann  access is therefore ordered against spinlocks. Many non-x86 PCI host bridge
256eeba4b01SArnd Bergmann  implementations and CPU architectures however fail to implement non-posted I/O
257eeba4b01SArnd Bergmann  space on PCI, so they can end up being posted on such hardware.
258eeba4b01SArnd Bergmann
259eeba4b01SArnd Bergmann  In some architectures, the I/O port number space has a 1:1 mapping to
260eeba4b01SArnd Bergmann  ``__iomem`` pointers, but this is not recommended and device drivers should
261eeba4b01SArnd Bergmann  not rely on that for portability. Similarly, an I/O port number as described
262eeba4b01SArnd Bergmann  in a PCI base address register may not correspond to the port number as seen
263eeba4b01SArnd Bergmann  by a device driver. Portable drivers need to read the port number for the
264eeba4b01SArnd Bergmann  resource provided by the kernel.
265eeba4b01SArnd Bergmann
266eeba4b01SArnd Bergmann  There are no direct 64-bit I/O port accessors, but pci_iomap() in combination
267eeba4b01SArnd Bergmann  with ioread64/iowrite64 can be used instead.
268eeba4b01SArnd Bergmann
269eeba4b01SArnd Bergmanninl_p(), inw_p(), inb_p(), outl_p(), outw_p(), outb_p()
270eeba4b01SArnd Bergmann
271eeba4b01SArnd Bergmann  On ISA devices that require specific timing, the _p versions of the I/O
272eeba4b01SArnd Bergmann  accessors add a small delay. On architectures that do not have ISA buses,
273eeba4b01SArnd Bergmann  these are aliases to the normal inb/outb helpers.
274eeba4b01SArnd Bergmann
275eeba4b01SArnd Bergmannreadsq, readsl, readsw, readsb
276eeba4b01SArnd Bergmannwritesq, writesl, writesw, writesb
277eeba4b01SArnd Bergmannioread64_rep, ioread32_rep, ioread16_rep, ioread8_rep
278eeba4b01SArnd Bergmanniowrite64_rep, iowrite32_rep, iowrite16_rep, iowrite8_rep
279eeba4b01SArnd Bergmanninsl, insw, insb, outsl, outsw, outsb
280eeba4b01SArnd Bergmann
281eeba4b01SArnd Bergmann  These are helpers that access the same address multiple times, usually to copy
282eeba4b01SArnd Bergmann  data between kernel memory byte stream and a FIFO buffer. Unlike the normal
283eeba4b01SArnd Bergmann  MMIO accessors, these do not perform a byteswap on big-endian kernels, so the
284eeba4b01SArnd Bergmann  first byte in the FIFO register corresponds to the first byte in the memory
285eeba4b01SArnd Bergmann  buffer regardless of the architecture.
286eeba4b01SArnd Bergmann
28708e9b5beSHector MartinDevice memory mapping modes
28808e9b5beSHector Martin===========================
28908e9b5beSHector Martin
29008e9b5beSHector MartinSome architectures support multiple modes for mapping device memory.
29108e9b5beSHector Martinioremap_*() variants provide a common abstraction around these
29208e9b5beSHector Martinarchitecture-specific modes, with a shared set of semantics.
29308e9b5beSHector Martin
29408e9b5beSHector Martinioremap() is the most common mapping type, and is applicable to typical device
29508e9b5beSHector Martinmemory (e.g. I/O registers). Other modes can offer weaker or stronger
29608e9b5beSHector Martinguarantees, if supported by the architecture. From most to least common, they
29708e9b5beSHector Martinare as follows:
29808e9b5beSHector Martin
29908e9b5beSHector Martinioremap()
30008e9b5beSHector Martin---------
30108e9b5beSHector Martin
30208e9b5beSHector MartinThe default mode, suitable for most memory-mapped devices, e.g. control
30308e9b5beSHector Martinregisters. Memory mapped using ioremap() has the following characteristics:
30408e9b5beSHector Martin
30508e9b5beSHector Martin* Uncached - CPU-side caches are bypassed, and all reads and writes are handled
30608e9b5beSHector Martin  directly by the device
30708e9b5beSHector Martin* No speculative operations - the CPU may not issue a read or write to this
30808e9b5beSHector Martin  memory, unless the instruction that does so has been reached in committed
30908e9b5beSHector Martin  program flow.
31008e9b5beSHector Martin* No reordering - The CPU may not reorder accesses to this memory mapping with
31108e9b5beSHector Martin  respect to each other. On some architectures, this relies on barriers in
31208e9b5beSHector Martin  readl_relaxed()/writel_relaxed().
31308e9b5beSHector Martin* No repetition - The CPU may not issue multiple reads or writes for a single
31408e9b5beSHector Martin  program instruction.
31508e9b5beSHector Martin* No write-combining - Each I/O operation results in one discrete read or write
31608e9b5beSHector Martin  being issued to the device, and multiple writes are not combined into larger
31708e9b5beSHector Martin  writes. This may or may not be enforced when using __raw I/O accessors or
31808e9b5beSHector Martin  pointer dereferences.
31908e9b5beSHector Martin* Non-executable - The CPU is not allowed to speculate instruction execution
32008e9b5beSHector Martin  from this memory (it probably goes without saying, but you're also not
32108e9b5beSHector Martin  allowed to jump into device memory).
32208e9b5beSHector Martin
32308e9b5beSHector MartinOn many platforms and buses (e.g. PCI), writes issued through ioremap()
32408e9b5beSHector Martinmappings are posted, which means that the CPU does not wait for the write to
32508e9b5beSHector Martinactually reach the target device before retiring the write instruction.
32608e9b5beSHector Martin
32708e9b5beSHector MartinOn many platforms, I/O accesses must be aligned with respect to the access
32808e9b5beSHector Martinsize; failure to do so will result in an exception or unpredictable results.
32908e9b5beSHector Martin
33008e9b5beSHector Martinioremap_wc()
33108e9b5beSHector Martin------------
33208e9b5beSHector Martin
33308e9b5beSHector MartinMaps I/O memory as normal memory with write combining. Unlike ioremap(),
33408e9b5beSHector Martin
33508e9b5beSHector Martin* The CPU may speculatively issue reads from the device that the program
33608e9b5beSHector Martin  didn't actually execute, and may choose to basically read whatever it wants.
33708e9b5beSHector Martin* The CPU may reorder operations as long as the result is consistent from the
33808e9b5beSHector Martin  program's point of view.
33908e9b5beSHector Martin* The CPU may write to the same location multiple times, even when the program
34008e9b5beSHector Martin  issued a single write.
34108e9b5beSHector Martin* The CPU may combine several writes into a single larger write.
34208e9b5beSHector Martin
34308e9b5beSHector MartinThis mode is typically used for video framebuffers, where it can increase
34408e9b5beSHector Martinperformance of writes. It can also be used for other blocks of memory in
34508e9b5beSHector Martindevices (e.g. buffers or shared memory), but care must be taken as accesses are
34608e9b5beSHector Martinnot guaranteed to be ordered with respect to normal ioremap() MMIO register
34708e9b5beSHector Martinaccesses without explicit barriers.
34808e9b5beSHector Martin
34908e9b5beSHector MartinOn a PCI bus, it is usually safe to use ioremap_wc() on MMIO areas marked as
35008e9b5beSHector Martin``IORESOURCE_PREFETCH``, but it may not be used on those without the flag.
35108e9b5beSHector MartinFor on-chip devices, there is no corresponding flag, but a driver can use
35208e9b5beSHector Martinioremap_wc() on a device that is known to be safe.
35308e9b5beSHector Martin
35408e9b5beSHector Martinioremap_wt()
35508e9b5beSHector Martin------------
35608e9b5beSHector Martin
35708e9b5beSHector MartinMaps I/O memory as normal memory with write-through caching. Like ioremap_wc(),
35808e9b5beSHector Martinbut also,
35908e9b5beSHector Martin
36008e9b5beSHector Martin* The CPU may cache writes issued to and reads from the device, and serve reads
36108e9b5beSHector Martin  from that cache.
36208e9b5beSHector Martin
36308e9b5beSHector MartinThis mode is sometimes used for video framebuffers, where drivers still expect
36408e9b5beSHector Martinwrites to reach the device in a timely manner (and not be stuck in the CPU
36508e9b5beSHector Martincache), but reads may be served from the cache for efficiency. However, it is
36608e9b5beSHector Martinrarely useful these days, as framebuffer drivers usually perform writes only,
36708e9b5beSHector Martinfor which ioremap_wc() is more efficient (as it doesn't needlessly trash the
36808e9b5beSHector Martincache). Most drivers should not use this.
36908e9b5beSHector Martin
37008e9b5beSHector Martinioremap_np()
37108e9b5beSHector Martin------------
37208e9b5beSHector Martin
37308e9b5beSHector MartinLike ioremap(), but explicitly requests non-posted write semantics. On some
37408e9b5beSHector Martinarchitectures and buses, ioremap() mappings have posted write semantics, which
37508e9b5beSHector Martinmeans that writes can appear to "complete" from the point of view of the
37608e9b5beSHector MartinCPU before the written data actually arrives at the target device. Writes are
37708e9b5beSHector Martinstill ordered with respect to other writes and reads from the same device, but
37808e9b5beSHector Martindue to the posted write semantics, this is not the case with respect to other
37908e9b5beSHector Martindevices. ioremap_np() explicitly requests non-posted semantics, which means
38008e9b5beSHector Martinthat the write instruction will not appear to complete until the device has
38108e9b5beSHector Martinreceived (and to some platform-specific extent acknowledged) the written data.
38208e9b5beSHector Martin
38308e9b5beSHector MartinThis mapping mode primarily exists to cater for platforms with bus fabrics that
38408e9b5beSHector Martinrequire this particular mapping mode to work correctly. These platforms set the
38508e9b5beSHector Martin``IORESOURCE_MEM_NONPOSTED`` flag for a resource that requires ioremap_np()
38608e9b5beSHector Martinsemantics and portable drivers should use an abstraction that automatically
38708e9b5beSHector Martinselects it where appropriate (see the `Higher-level ioremap abstractions`_
38808e9b5beSHector Martinsection below).
38908e9b5beSHector Martin
39008e9b5beSHector MartinThe bare ioremap_np() is only available on some architectures; on others, it
39108e9b5beSHector Martinalways returns NULL. Drivers should not normally use it, unless they are
39208e9b5beSHector Martinplatform-specific or they derive benefit from non-posted writes where
39308e9b5beSHector Martinsupported, and can fall back to ioremap() otherwise. The normal approach to
39408e9b5beSHector Martinensure posted write completion is to do a dummy read after a write as
39508e9b5beSHector Martinexplained in `Accessing the device`_, which works with ioremap() on all
39608e9b5beSHector Martinplatforms.
39708e9b5beSHector Martin
39808e9b5beSHector Martinioremap_np() should never be used for PCI drivers. PCI memory space writes are
39908e9b5beSHector Martinalways posted, even on architectures that otherwise implement ioremap_np().
40008e9b5beSHector MartinUsing ioremap_np() for PCI BARs will at best result in posted write semantics,
40108e9b5beSHector Martinand at worst result in complete breakage.
40208e9b5beSHector Martin
40308e9b5beSHector MartinNote that non-posted write semantics are orthogonal to CPU-side ordering
40408e9b5beSHector Martinguarantees. A CPU may still choose to issue other reads or writes before a
40508e9b5beSHector Martinnon-posted write instruction retires. See the previous section on MMIO access
40608e9b5beSHector Martinfunctions for details on the CPU side of things.
40708e9b5beSHector Martin
40808e9b5beSHector Martinioremap_uc()
40908e9b5beSHector Martin------------
41008e9b5beSHector Martin
41108e9b5beSHector Martinioremap_uc() behaves like ioremap() except that on the x86 architecture without
41208e9b5beSHector Martin'PAT' mode, it marks memory as uncached even when the MTRR has designated
413*ff61f079SJonathan Corbetit as cacheable, see Documentation/arch/x86/pat.rst.
41408e9b5beSHector Martin
41508e9b5beSHector MartinPortable drivers should avoid the use of ioremap_uc().
41608e9b5beSHector Martin
41708e9b5beSHector Martinioremap_cache()
41808e9b5beSHector Martin---------------
41908e9b5beSHector Martin
42008e9b5beSHector Martinioremap_cache() effectively maps I/O memory as normal RAM. CPU write-back
42108e9b5beSHector Martincaches can be used, and the CPU is free to treat the device as if it were a
42208e9b5beSHector Martinblock of RAM. This should never be used for device memory which has side
42308e9b5beSHector Martineffects of any kind, or which does not return the data previously written on
42408e9b5beSHector Martinread.
42508e9b5beSHector Martin
42608e9b5beSHector MartinIt should also not be used for actual RAM, as the returned pointer is an
42708e9b5beSHector Martin``__iomem`` token. memremap() can be used for mapping normal RAM that is outside
42808e9b5beSHector Martinof the linear kernel memory area to a regular pointer.
42908e9b5beSHector Martin
43008e9b5beSHector MartinPortable drivers should avoid the use of ioremap_cache().
43108e9b5beSHector Martin
43208e9b5beSHector MartinArchitecture example
43308e9b5beSHector Martin--------------------
43408e9b5beSHector Martin
43508e9b5beSHector MartinHere is how the above modes map to memory attribute settings on the ARM64
43608e9b5beSHector Martinarchitecture:
43708e9b5beSHector Martin
43808e9b5beSHector Martin+------------------------+--------------------------------------------+
43908e9b5beSHector Martin| API                    | Memory region type and cacheability        |
44008e9b5beSHector Martin+------------------------+--------------------------------------------+
44108e9b5beSHector Martin| ioremap_np()           | Device-nGnRnE                              |
44208e9b5beSHector Martin+------------------------+--------------------------------------------+
44308e9b5beSHector Martin| ioremap()              | Device-nGnRE                               |
44408e9b5beSHector Martin+------------------------+--------------------------------------------+
44508e9b5beSHector Martin| ioremap_uc()           | (not implemented)                          |
44608e9b5beSHector Martin+------------------------+--------------------------------------------+
44708e9b5beSHector Martin| ioremap_wc()           | Normal-Non Cacheable                       |
44808e9b5beSHector Martin+------------------------+--------------------------------------------+
44908e9b5beSHector Martin| ioremap_wt()           | (not implemented; fallback to ioremap)     |
45008e9b5beSHector Martin+------------------------+--------------------------------------------+
45108e9b5beSHector Martin| ioremap_cache()        | Normal-Write-Back Cacheable                |
45208e9b5beSHector Martin+------------------------+--------------------------------------------+
45308e9b5beSHector Martin
45408e9b5beSHector MartinHigher-level ioremap abstractions
45508e9b5beSHector Martin=================================
45608e9b5beSHector Martin
45708e9b5beSHector MartinInstead of using the above raw ioremap() modes, drivers are encouraged to use
45808e9b5beSHector Martinhigher-level APIs. These APIs may implement platform-specific logic to
45908e9b5beSHector Martinautomatically choose an appropriate ioremap mode on any given bus, allowing for
46008e9b5beSHector Martina platform-agnostic driver to work on those platforms without any special
46108e9b5beSHector Martincases. At the time of this writing, the following ioremap() wrappers have such
46208e9b5beSHector Martinlogic:
46308e9b5beSHector Martin
46408e9b5beSHector Martindevm_ioremap_resource()
46508e9b5beSHector Martin
46608e9b5beSHector Martin  Can automatically select ioremap_np() over ioremap() according to platform
46708e9b5beSHector Martin  requirements, if the ``IORESOURCE_MEM_NONPOSTED`` flag is set on the struct
46808e9b5beSHector Martin  resource. Uses devres to automatically unmap the resource when the driver
46908e9b5beSHector Martin  probe() function fails or a device in unbound from its driver.
47008e9b5beSHector Martin
47108e9b5beSHector Martin  Documented in Documentation/driver-api/driver-model/devres.rst.
47208e9b5beSHector Martin
47308e9b5beSHector Martinof_address_to_resource()
47408e9b5beSHector Martin
47508e9b5beSHector Martin  Automatically sets the ``IORESOURCE_MEM_NONPOSTED`` flag for platforms that
47608e9b5beSHector Martin  require non-posted writes for certain buses (see the nonposted-mmio and
47708e9b5beSHector Martin  posted-mmio device tree properties).
47808e9b5beSHector Martin
47908e9b5beSHector Martinof_iomap()
48008e9b5beSHector Martin
48108e9b5beSHector Martin  Maps the resource described in a ``reg`` property in the device tree, doing
48208e9b5beSHector Martin  all required translations. Automatically selects ioremap_np() according to
48308e9b5beSHector Martin  platform requirements, as above.
48408e9b5beSHector Martin
48508e9b5beSHector Martinpci_ioremap_bar(), pci_ioremap_wc_bar()
48608e9b5beSHector Martin
48708e9b5beSHector Martin  Maps the resource described in a PCI base address without having to extract
48808e9b5beSHector Martin  the physical address first.
48908e9b5beSHector Martin
49008e9b5beSHector Martinpci_iomap(), pci_iomap_wc()
49108e9b5beSHector Martin
49208e9b5beSHector Martin  Like pci_ioremap_bar()/pci_ioremap_bar(), but also works on I/O space when
49308e9b5beSHector Martin  used together with ioread32()/iowrite32() and similar accessors
49408e9b5beSHector Martin
49508e9b5beSHector Martinpcim_iomap()
49608e9b5beSHector Martin
49708e9b5beSHector Martin  Like pci_iomap(), but uses devres to automatically unmap the resource when
49808e9b5beSHector Martin  the driver probe() function fails or a device in unbound from its driver
49908e9b5beSHector Martin
50008e9b5beSHector Martin  Documented in Documentation/driver-api/driver-model/devres.rst.
50108e9b5beSHector Martin
50208e9b5beSHector MartinNot using these wrappers may make drivers unusable on certain platforms with
50308e9b5beSHector Martinstricter rules for mapping I/O memory.
50408e9b5beSHector Martin
5057938f421SLucas De MarchiGeneralizing Access to System and I/O Memory
5067938f421SLucas De Marchi============================================
5077938f421SLucas De Marchi
5087938f421SLucas De Marchi.. kernel-doc:: include/linux/iosys-map.h
5097938f421SLucas De Marchi   :doc: overview
5107938f421SLucas De Marchi
5117938f421SLucas De Marchi.. kernel-doc:: include/linux/iosys-map.h
5127938f421SLucas De Marchi   :internal:
5137938f421SLucas De Marchi
5148a8a602fSJonathan CorbetPublic Functions Provided
5158a8a602fSJonathan Corbet=========================
5168a8a602fSJonathan Corbet
5178a8a602fSJonathan Corbet.. kernel-doc:: arch/x86/include/asm/io.h
5188a8a602fSJonathan Corbet   :internal:
5198a8a602fSJonathan Corbet
5208a8a602fSJonathan Corbet.. kernel-doc:: lib/pci_iomap.c
5218a8a602fSJonathan Corbet   :export:
522