1*71dbc487SJonathan Corbet================================== 2*71dbc487SJonathan CorbetMemory Attribute Aliasing on IA-64 3*71dbc487SJonathan Corbet================================== 4*71dbc487SJonathan Corbet 5*71dbc487SJonathan CorbetBjorn Helgaas <bjorn.helgaas@hp.com> 6*71dbc487SJonathan Corbet 7*71dbc487SJonathan CorbetMay 4, 2006 8*71dbc487SJonathan Corbet 9*71dbc487SJonathan Corbet 10*71dbc487SJonathan CorbetMemory Attributes 11*71dbc487SJonathan Corbet================= 12*71dbc487SJonathan Corbet 13*71dbc487SJonathan Corbet Itanium supports several attributes for virtual memory references. 14*71dbc487SJonathan Corbet The attribute is part of the virtual translation, i.e., it is 15*71dbc487SJonathan Corbet contained in the TLB entry. The ones of most interest to the Linux 16*71dbc487SJonathan Corbet kernel are: 17*71dbc487SJonathan Corbet 18*71dbc487SJonathan Corbet == ====================== 19*71dbc487SJonathan Corbet WB Write-back (cacheable) 20*71dbc487SJonathan Corbet UC Uncacheable 21*71dbc487SJonathan Corbet WC Write-coalescing 22*71dbc487SJonathan Corbet == ====================== 23*71dbc487SJonathan Corbet 24*71dbc487SJonathan Corbet System memory typically uses the WB attribute. The UC attribute is 25*71dbc487SJonathan Corbet used for memory-mapped I/O devices. The WC attribute is uncacheable 26*71dbc487SJonathan Corbet like UC is, but writes may be delayed and combined to increase 27*71dbc487SJonathan Corbet performance for things like frame buffers. 28*71dbc487SJonathan Corbet 29*71dbc487SJonathan Corbet The Itanium architecture requires that we avoid accessing the same 30*71dbc487SJonathan Corbet page with both a cacheable mapping and an uncacheable mapping[1]. 31*71dbc487SJonathan Corbet 32*71dbc487SJonathan Corbet The design of the chipset determines which attributes are supported 33*71dbc487SJonathan Corbet on which regions of the address space. For example, some chipsets 34*71dbc487SJonathan Corbet support either WB or UC access to main memory, while others support 35*71dbc487SJonathan Corbet only WB access. 36*71dbc487SJonathan Corbet 37*71dbc487SJonathan CorbetMemory Map 38*71dbc487SJonathan Corbet========== 39*71dbc487SJonathan Corbet 40*71dbc487SJonathan Corbet Platform firmware describes the physical memory map and the 41*71dbc487SJonathan Corbet supported attributes for each region. At boot-time, the kernel uses 42*71dbc487SJonathan Corbet the EFI GetMemoryMap() interface. ACPI can also describe memory 43*71dbc487SJonathan Corbet devices and the attributes they support, but Linux/ia64 currently 44*71dbc487SJonathan Corbet doesn't use this information. 45*71dbc487SJonathan Corbet 46*71dbc487SJonathan Corbet The kernel uses the efi_memmap table returned from GetMemoryMap() to 47*71dbc487SJonathan Corbet learn the attributes supported by each region of physical address 48*71dbc487SJonathan Corbet space. Unfortunately, this table does not completely describe the 49*71dbc487SJonathan Corbet address space because some machines omit some or all of the MMIO 50*71dbc487SJonathan Corbet regions from the map. 51*71dbc487SJonathan Corbet 52*71dbc487SJonathan Corbet The kernel maintains another table, kern_memmap, which describes the 53*71dbc487SJonathan Corbet memory Linux is actually using and the attribute for each region. 54*71dbc487SJonathan Corbet This contains only system memory; it does not contain MMIO space. 55*71dbc487SJonathan Corbet 56*71dbc487SJonathan Corbet The kern_memmap table typically contains only a subset of the system 57*71dbc487SJonathan Corbet memory described by the efi_memmap. Linux/ia64 can't use all memory 58*71dbc487SJonathan Corbet in the system because of constraints imposed by the identity mapping 59*71dbc487SJonathan Corbet scheme. 60*71dbc487SJonathan Corbet 61*71dbc487SJonathan Corbet The efi_memmap table is preserved unmodified because the original 62*71dbc487SJonathan Corbet boot-time information is required for kexec. 63*71dbc487SJonathan Corbet 64*71dbc487SJonathan CorbetKernel Identity Mappings 65*71dbc487SJonathan Corbet======================== 66*71dbc487SJonathan Corbet 67*71dbc487SJonathan Corbet Linux/ia64 identity mappings are done with large pages, currently 68*71dbc487SJonathan Corbet either 16MB or 64MB, referred to as "granules." Cacheable mappings 69*71dbc487SJonathan Corbet are speculative[2], so the processor can read any location in the 70*71dbc487SJonathan Corbet page at any time, independent of the programmer's intentions. This 71*71dbc487SJonathan Corbet means that to avoid attribute aliasing, Linux can create a cacheable 72*71dbc487SJonathan Corbet identity mapping only when the entire granule supports cacheable 73*71dbc487SJonathan Corbet access. 74*71dbc487SJonathan Corbet 75*71dbc487SJonathan Corbet Therefore, kern_memmap contains only full granule-sized regions that 76*71dbc487SJonathan Corbet can referenced safely by an identity mapping. 77*71dbc487SJonathan Corbet 78*71dbc487SJonathan Corbet Uncacheable mappings are not speculative, so the processor will 79*71dbc487SJonathan Corbet generate UC accesses only to locations explicitly referenced by 80*71dbc487SJonathan Corbet software. This allows UC identity mappings to cover granules that 81*71dbc487SJonathan Corbet are only partially populated, or populated with a combination of UC 82*71dbc487SJonathan Corbet and WB regions. 83*71dbc487SJonathan Corbet 84*71dbc487SJonathan CorbetUser Mappings 85*71dbc487SJonathan Corbet============= 86*71dbc487SJonathan Corbet 87*71dbc487SJonathan Corbet User mappings are typically done with 16K or 64K pages. The smaller 88*71dbc487SJonathan Corbet page size allows more flexibility because only 16K or 64K has to be 89*71dbc487SJonathan Corbet homogeneous with respect to memory attributes. 90*71dbc487SJonathan Corbet 91*71dbc487SJonathan CorbetPotential Attribute Aliasing Cases 92*71dbc487SJonathan Corbet================================== 93*71dbc487SJonathan Corbet 94*71dbc487SJonathan Corbet There are several ways the kernel creates new mappings: 95*71dbc487SJonathan Corbet 96*71dbc487SJonathan Corbetmmap of /dev/mem 97*71dbc487SJonathan Corbet---------------- 98*71dbc487SJonathan Corbet 99*71dbc487SJonathan Corbet This uses remap_pfn_range(), which creates user mappings. These 100*71dbc487SJonathan Corbet mappings may be either WB or UC. If the region being mapped 101*71dbc487SJonathan Corbet happens to be in kern_memmap, meaning that it may also be mapped 102*71dbc487SJonathan Corbet by a kernel identity mapping, the user mapping must use the same 103*71dbc487SJonathan Corbet attribute as the kernel mapping. 104*71dbc487SJonathan Corbet 105*71dbc487SJonathan Corbet If the region is not in kern_memmap, the user mapping should use 106*71dbc487SJonathan Corbet an attribute reported as being supported in the EFI memory map. 107*71dbc487SJonathan Corbet 108*71dbc487SJonathan Corbet Since the EFI memory map does not describe MMIO on some 109*71dbc487SJonathan Corbet machines, this should use an uncacheable mapping as a fallback. 110*71dbc487SJonathan Corbet 111*71dbc487SJonathan Corbetmmap of /sys/class/pci_bus/.../legacy_mem 112*71dbc487SJonathan Corbet----------------------------------------- 113*71dbc487SJonathan Corbet 114*71dbc487SJonathan Corbet This is very similar to mmap of /dev/mem, except that legacy_mem 115*71dbc487SJonathan Corbet only allows mmap of the one megabyte "legacy MMIO" area for a 116*71dbc487SJonathan Corbet specific PCI bus. Typically this is the first megabyte of 117*71dbc487SJonathan Corbet physical address space, but it may be different on machines with 118*71dbc487SJonathan Corbet several VGA devices. 119*71dbc487SJonathan Corbet 120*71dbc487SJonathan Corbet "X" uses this to access VGA frame buffers. Using legacy_mem 121*71dbc487SJonathan Corbet rather than /dev/mem allows multiple instances of X to talk to 122*71dbc487SJonathan Corbet different VGA cards. 123*71dbc487SJonathan Corbet 124*71dbc487SJonathan Corbet The /dev/mem mmap constraints apply. 125*71dbc487SJonathan Corbet 126*71dbc487SJonathan Corbetmmap of /proc/bus/pci/.../??.? 127*71dbc487SJonathan Corbet------------------------------ 128*71dbc487SJonathan Corbet 129*71dbc487SJonathan Corbet This is an MMIO mmap of PCI functions, which additionally may or 130*71dbc487SJonathan Corbet may not be requested as using the WC attribute. 131*71dbc487SJonathan Corbet 132*71dbc487SJonathan Corbet If WC is requested, and the region in kern_memmap is either WC 133*71dbc487SJonathan Corbet or UC, and the EFI memory map designates the region as WC, then 134*71dbc487SJonathan Corbet the WC mapping is allowed. 135*71dbc487SJonathan Corbet 136*71dbc487SJonathan Corbet Otherwise, the user mapping must use the same attribute as the 137*71dbc487SJonathan Corbet kernel mapping. 138*71dbc487SJonathan Corbet 139*71dbc487SJonathan Corbetread/write of /dev/mem 140*71dbc487SJonathan Corbet---------------------- 141*71dbc487SJonathan Corbet 142*71dbc487SJonathan Corbet This uses copy_from_user(), which implicitly uses a kernel 143*71dbc487SJonathan Corbet identity mapping. This is obviously safe for things in 144*71dbc487SJonathan Corbet kern_memmap. 145*71dbc487SJonathan Corbet 146*71dbc487SJonathan Corbet There may be corner cases of things that are not in kern_memmap, 147*71dbc487SJonathan Corbet but could be accessed this way. For example, registers in MMIO 148*71dbc487SJonathan Corbet space are not in kern_memmap, but could be accessed with a UC 149*71dbc487SJonathan Corbet mapping. This would not cause attribute aliasing. But 150*71dbc487SJonathan Corbet registers typically can be accessed only with four-byte or 151*71dbc487SJonathan Corbet eight-byte accesses, and the copy_from_user() path doesn't allow 152*71dbc487SJonathan Corbet any control over the access size, so this would be dangerous. 153*71dbc487SJonathan Corbet 154*71dbc487SJonathan Corbetioremap() 155*71dbc487SJonathan Corbet--------- 156*71dbc487SJonathan Corbet 157*71dbc487SJonathan Corbet This returns a mapping for use inside the kernel. 158*71dbc487SJonathan Corbet 159*71dbc487SJonathan Corbet If the region is in kern_memmap, we should use the attribute 160*71dbc487SJonathan Corbet specified there. 161*71dbc487SJonathan Corbet 162*71dbc487SJonathan Corbet If the EFI memory map reports that the entire granule supports 163*71dbc487SJonathan Corbet WB, we should use that (granules that are partially reserved 164*71dbc487SJonathan Corbet or occupied by firmware do not appear in kern_memmap). 165*71dbc487SJonathan Corbet 166*71dbc487SJonathan Corbet If the granule contains non-WB memory, but we can cover the 167*71dbc487SJonathan Corbet region safely with kernel page table mappings, we can use 168*71dbc487SJonathan Corbet ioremap_page_range() as most other architectures do. 169*71dbc487SJonathan Corbet 170*71dbc487SJonathan Corbet Failing all of the above, we have to fall back to a UC mapping. 171*71dbc487SJonathan Corbet 172*71dbc487SJonathan CorbetPast Problem Cases 173*71dbc487SJonathan Corbet================== 174*71dbc487SJonathan Corbet 175*71dbc487SJonathan Corbetmmap of various MMIO regions from /dev/mem by "X" on Intel platforms 176*71dbc487SJonathan Corbet-------------------------------------------------------------------- 177*71dbc487SJonathan Corbet 178*71dbc487SJonathan Corbet The EFI memory map may not report these MMIO regions. 179*71dbc487SJonathan Corbet 180*71dbc487SJonathan Corbet These must be allowed so that X will work. This means that 181*71dbc487SJonathan Corbet when the EFI memory map is incomplete, every /dev/mem mmap must 182*71dbc487SJonathan Corbet succeed. It may create either WB or UC user mappings, depending 183*71dbc487SJonathan Corbet on whether the region is in kern_memmap or the EFI memory map. 184*71dbc487SJonathan Corbet 185*71dbc487SJonathan Corbetmmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled 186*71dbc487SJonathan Corbet---------------------------------------------------------------------- 187*71dbc487SJonathan Corbet 188*71dbc487SJonathan Corbet The EFI memory map reports the following attributes: 189*71dbc487SJonathan Corbet 190*71dbc487SJonathan Corbet =============== ======= ================== 191*71dbc487SJonathan Corbet 0x00000-0x9FFFF WB only 192*71dbc487SJonathan Corbet 0xA0000-0xBFFFF UC only (VGA frame buffer) 193*71dbc487SJonathan Corbet 0xC0000-0xFFFFF WB only 194*71dbc487SJonathan Corbet =============== ======= ================== 195*71dbc487SJonathan Corbet 196*71dbc487SJonathan Corbet This mmap is done with user pages, not kernel identity mappings, 197*71dbc487SJonathan Corbet so it is safe to use WB mappings. 198*71dbc487SJonathan Corbet 199*71dbc487SJonathan Corbet The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, 200*71dbc487SJonathan Corbet which uses a granule-sized UC mapping. This granule will cover some 201*71dbc487SJonathan Corbet WB-only memory, but since UC is non-speculative, the processor will 202*71dbc487SJonathan Corbet never generate an uncacheable reference to the WB-only areas unless 203*71dbc487SJonathan Corbet the driver explicitly touches them. 204*71dbc487SJonathan Corbet 205*71dbc487SJonathan Corbetmmap of 0x0-0xFFFFF legacy_mem by "X" 206*71dbc487SJonathan Corbet------------------------------------- 207*71dbc487SJonathan Corbet 208*71dbc487SJonathan Corbet If the EFI memory map reports that the entire range supports the 209*71dbc487SJonathan Corbet same attributes, we can allow the mmap (and we will prefer WB if 210*71dbc487SJonathan Corbet supported, as is the case with HP sx[12]000 machines with VGA 211*71dbc487SJonathan Corbet disabled). 212*71dbc487SJonathan Corbet 213*71dbc487SJonathan Corbet If EFI reports the range as partly WB and partly UC (as on sx[12]000 214*71dbc487SJonathan Corbet machines with VGA enabled), we must fail the mmap because there's no 215*71dbc487SJonathan Corbet safe attribute to use. 216*71dbc487SJonathan Corbet 217*71dbc487SJonathan Corbet If EFI reports some of the range but not all (as on Intel firmware 218*71dbc487SJonathan Corbet that doesn't report the VGA frame buffer at all), we should fail the 219*71dbc487SJonathan Corbet mmap and force the user to map just the specific region of interest. 220*71dbc487SJonathan Corbet 221*71dbc487SJonathan Corbetmmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled 222*71dbc487SJonathan Corbet------------------------------------------------------------------------ 223*71dbc487SJonathan Corbet 224*71dbc487SJonathan Corbet The EFI memory map reports the following attributes:: 225*71dbc487SJonathan Corbet 226*71dbc487SJonathan Corbet 0x00000-0xFFFFF WB only (no VGA MMIO hole) 227*71dbc487SJonathan Corbet 228*71dbc487SJonathan Corbet This is a special case of the previous case, and the mmap should 229*71dbc487SJonathan Corbet fail for the same reason as above. 230*71dbc487SJonathan Corbet 231*71dbc487SJonathan Corbetread of /sys/devices/.../rom 232*71dbc487SJonathan Corbet---------------------------- 233*71dbc487SJonathan Corbet 234*71dbc487SJonathan Corbet For VGA devices, this may cause an ioremap() of 0xC0000. This 235*71dbc487SJonathan Corbet used to be done with a UC mapping, because the VGA frame buffer 236*71dbc487SJonathan Corbet at 0xA0000 prevents use of a WB granule. The UC mapping causes 237*71dbc487SJonathan Corbet an MCA on HP sx[12]000 chipsets. 238*71dbc487SJonathan Corbet 239*71dbc487SJonathan Corbet We should use WB page table mappings to avoid covering the VGA 240*71dbc487SJonathan Corbet frame buffer. 241*71dbc487SJonathan Corbet 242*71dbc487SJonathan CorbetNotes 243*71dbc487SJonathan Corbet===== 244*71dbc487SJonathan Corbet 245*71dbc487SJonathan Corbet [1] SDM rev 2.2, vol 2, sec 4.4.1. 246*71dbc487SJonathan Corbet [2] SDM rev 2.2, vol 2, sec 4.4.6. 247