1*ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0 2*ff61f079SJonathan Corbet 3*ff61f079SJonathan Corbet========================== 4*ff61f079SJonathan CorbetPAT (Page Attribute Table) 5*ff61f079SJonathan Corbet========================== 6*ff61f079SJonathan Corbet 7*ff61f079SJonathan Corbetx86 Page Attribute Table (PAT) allows for setting the memory attribute at the 8*ff61f079SJonathan Corbetpage level granularity. PAT is complementary to the MTRR settings which allows 9*ff61f079SJonathan Corbetfor setting of memory types over physical address ranges. However, PAT is 10*ff61f079SJonathan Corbetmore flexible than MTRR due to its capability to set attributes at page level 11*ff61f079SJonathan Corbetand also due to the fact that there are no hardware limitations on number of 12*ff61f079SJonathan Corbetsuch attribute settings allowed. Added flexibility comes with guidelines for 13*ff61f079SJonathan Corbetnot having memory type aliasing for the same physical memory with multiple 14*ff61f079SJonathan Corbetvirtual addresses. 15*ff61f079SJonathan Corbet 16*ff61f079SJonathan CorbetPAT allows for different types of memory attributes. The most commonly used 17*ff61f079SJonathan Corbetones that will be supported at this time are: 18*ff61f079SJonathan Corbet 19*ff61f079SJonathan Corbet=== ============== 20*ff61f079SJonathan CorbetWB Write-back 21*ff61f079SJonathan CorbetUC Uncached 22*ff61f079SJonathan CorbetWC Write-combined 23*ff61f079SJonathan CorbetWT Write-through 24*ff61f079SJonathan CorbetUC- Uncached Minus 25*ff61f079SJonathan Corbet=== ============== 26*ff61f079SJonathan Corbet 27*ff61f079SJonathan Corbet 28*ff61f079SJonathan CorbetPAT APIs 29*ff61f079SJonathan Corbet======== 30*ff61f079SJonathan Corbet 31*ff61f079SJonathan CorbetThere are many different APIs in the kernel that allows setting of memory 32*ff61f079SJonathan Corbetattributes at the page level. In order to avoid aliasing, these interfaces 33*ff61f079SJonathan Corbetshould be used thoughtfully. Below is a table of interfaces available, 34*ff61f079SJonathan Corbettheir intended usage and their memory attribute relationships. Internally, 35*ff61f079SJonathan Corbetthese APIs use a reserve_memtype()/free_memtype() interface on the physical 36*ff61f079SJonathan Corbetaddress range to avoid any aliasing. 37*ff61f079SJonathan Corbet 38*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 39*ff61f079SJonathan Corbet| API | RAM | ACPI,... | Reserved/Holes | 40*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 41*ff61f079SJonathan Corbet| ioremap | -- | UC- | UC- | 42*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 43*ff61f079SJonathan Corbet| ioremap_cache | -- | WB | WB | 44*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 45*ff61f079SJonathan Corbet| ioremap_uc | -- | UC | UC | 46*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 47*ff61f079SJonathan Corbet| ioremap_wc | -- | -- | WC | 48*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 49*ff61f079SJonathan Corbet| ioremap_wt | -- | -- | WT | 50*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 51*ff61f079SJonathan Corbet| set_memory_uc, | UC- | -- | -- | 52*ff61f079SJonathan Corbet| set_memory_wb | | | | 53*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 54*ff61f079SJonathan Corbet| set_memory_wc, | WC | -- | -- | 55*ff61f079SJonathan Corbet| set_memory_wb | | | | 56*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 57*ff61f079SJonathan Corbet| set_memory_wt, | WT | -- | -- | 58*ff61f079SJonathan Corbet| set_memory_wb | | | | 59*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 60*ff61f079SJonathan Corbet| pci sysfs resource | -- | -- | UC- | 61*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 62*ff61f079SJonathan Corbet| pci sysfs resource_wc | -- | -- | WC | 63*ff61f079SJonathan Corbet| is IORESOURCE_PREFETCH | | | | 64*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 65*ff61f079SJonathan Corbet| pci proc | -- | -- | UC- | 66*ff61f079SJonathan Corbet| !PCIIOC_WRITE_COMBINE | | | | 67*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 68*ff61f079SJonathan Corbet| pci proc | -- | -- | WC | 69*ff61f079SJonathan Corbet| PCIIOC_WRITE_COMBINE | | | | 70*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 71*ff61f079SJonathan Corbet| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 72*ff61f079SJonathan Corbet| read-write | | | | 73*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 74*ff61f079SJonathan Corbet| /dev/mem | -- | UC- | UC- | 75*ff61f079SJonathan Corbet| mmap SYNC flag | | | | 76*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 77*ff61f079SJonathan Corbet| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- | 78*ff61f079SJonathan Corbet| mmap !SYNC flag | | | | 79*ff61f079SJonathan Corbet| and | |(from existing| (from existing | 80*ff61f079SJonathan Corbet| any alias to this area | |alias) | alias) | 81*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 82*ff61f079SJonathan Corbet| /dev/mem | -- | WB | WB | 83*ff61f079SJonathan Corbet| mmap !SYNC flag | | | | 84*ff61f079SJonathan Corbet| no alias to this area | | | | 85*ff61f079SJonathan Corbet| and | | | | 86*ff61f079SJonathan Corbet| MTRR says WB | | | | 87*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 88*ff61f079SJonathan Corbet| /dev/mem | -- | -- | UC- | 89*ff61f079SJonathan Corbet| mmap !SYNC flag | | | | 90*ff61f079SJonathan Corbet| no alias to this area | | | | 91*ff61f079SJonathan Corbet| and | | | | 92*ff61f079SJonathan Corbet| MTRR says !WB | | | | 93*ff61f079SJonathan Corbet+------------------------+----------+--------------+------------------+ 94*ff61f079SJonathan Corbet 95*ff61f079SJonathan Corbet 96*ff61f079SJonathan CorbetAdvanced APIs for drivers 97*ff61f079SJonathan Corbet========================= 98*ff61f079SJonathan Corbet 99*ff61f079SJonathan CorbetA. Exporting pages to users with remap_pfn_range, io_remap_pfn_range, 100*ff61f079SJonathan Corbetvmf_insert_pfn. 101*ff61f079SJonathan Corbet 102*ff61f079SJonathan CorbetDrivers wanting to export some pages to userspace do it by using mmap 103*ff61f079SJonathan Corbetinterface and a combination of: 104*ff61f079SJonathan Corbet 105*ff61f079SJonathan Corbet 1) pgprot_noncached() 106*ff61f079SJonathan Corbet 2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn() 107*ff61f079SJonathan Corbet 108*ff61f079SJonathan CorbetWith PAT support, a new API pgprot_writecombine is being added. So, drivers can 109*ff61f079SJonathan Corbetcontinue to use the above sequence, with either pgprot_noncached() or 110*ff61f079SJonathan Corbetpgprot_writecombine() in step 1, followed by step 2. 111*ff61f079SJonathan Corbet 112*ff61f079SJonathan CorbetIn addition, step 2 internally tracks the region as UC or WC in memtype 113*ff61f079SJonathan Corbetlist in order to ensure no conflicting mapping. 114*ff61f079SJonathan Corbet 115*ff61f079SJonathan CorbetNote that this set of APIs only works with IO (non RAM) regions. If driver 116*ff61f079SJonathan Corbetwants to export a RAM region, it has to do set_memory_uc() or set_memory_wc() 117*ff61f079SJonathan Corbetas step 0 above and also track the usage of those pages and use set_memory_wb() 118*ff61f079SJonathan Corbetbefore the page is freed to free pool. 119*ff61f079SJonathan Corbet 120*ff61f079SJonathan CorbetMTRR effects on PAT / non-PAT systems 121*ff61f079SJonathan Corbet===================================== 122*ff61f079SJonathan Corbet 123*ff61f079SJonathan CorbetThe following table provides the effects of using write-combining MTRRs when 124*ff61f079SJonathan Corbetusing ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally 125*ff61f079SJonathan Corbetmtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will 126*ff61f079SJonathan Corbetbe a no-op on PAT enabled systems. The region over which a arch_phys_wc_add() 127*ff61f079SJonathan Corbetis made, should already have been ioremapped with WC attributes or PAT entries, 128*ff61f079SJonathan Corbetthis can be done by using ioremap_wc() / set_memory_wc(). Devices which 129*ff61f079SJonathan Corbetcombine areas of IO memory desired to remain uncacheable with areas where 130*ff61f079SJonathan Corbetwrite-combining is desirable should consider use of ioremap_uc() followed by 131*ff61f079SJonathan Corbetset_memory_wc() to white-list effective write-combined areas. Such use is 132*ff61f079SJonathan Corbetnevertheless discouraged as the effective memory type is considered 133*ff61f079SJonathan Corbetimplementation defined, yet this strategy can be used as last resort on devices 134*ff61f079SJonathan Corbetwith size-constrained regions where otherwise MTRR write-combining would 135*ff61f079SJonathan Corbetotherwise not be effective. 136*ff61f079SJonathan Corbet:: 137*ff61f079SJonathan Corbet 138*ff61f079SJonathan Corbet ==== ======= === ========================= ===================== 139*ff61f079SJonathan Corbet MTRR Non-PAT PAT Linux ioremap value Effective memory type 140*ff61f079SJonathan Corbet ==== ======= === ========================= ===================== 141*ff61f079SJonathan Corbet PAT Non-PAT | PAT 142*ff61f079SJonathan Corbet |PCD | 143*ff61f079SJonathan Corbet ||PWT | 144*ff61f079SJonathan Corbet ||| | 145*ff61f079SJonathan Corbet WC 000 WB _PAGE_CACHE_MODE_WB WC | WC 146*ff61f079SJonathan Corbet WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC 147*ff61f079SJonathan Corbet WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC 148*ff61f079SJonathan Corbet WC 011 UC _PAGE_CACHE_MODE_UC UC | UC 149*ff61f079SJonathan Corbet ==== ======= === ========================= ===================== 150*ff61f079SJonathan Corbet 151*ff61f079SJonathan Corbet (*) denotes implementation defined and is discouraged 152*ff61f079SJonathan Corbet 153*ff61f079SJonathan Corbet.. note:: -- in the above table mean "Not suggested usage for the API". Some 154*ff61f079SJonathan Corbet of the --'s are strictly enforced by the kernel. Some others are not really 155*ff61f079SJonathan Corbet enforced today, but may be enforced in future. 156*ff61f079SJonathan Corbet 157*ff61f079SJonathan CorbetFor ioremap and pci access through /sys or /proc - The actual type returned 158*ff61f079SJonathan Corbetcan be more restrictive, in case of any existing aliasing for that address. 159*ff61f079SJonathan CorbetFor example: If there is an existing uncached mapping, a new ioremap_wc can 160*ff61f079SJonathan Corbetreturn uncached mapping in place of write-combine requested. 161*ff61f079SJonathan Corbet 162*ff61f079SJonathan Corbetset_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver 163*ff61f079SJonathan Corbetwill first make a region uc, wc or wt and switch it back to wb after use. 164*ff61f079SJonathan Corbet 165*ff61f079SJonathan CorbetOver time writes to /proc/mtrr will be deprecated in favor of using PAT based 166*ff61f079SJonathan Corbetinterfaces. Users writing to /proc/mtrr are suggested to use above interfaces. 167*ff61f079SJonathan Corbet 168*ff61f079SJonathan CorbetDrivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access 169*ff61f079SJonathan Corbettypes. 170*ff61f079SJonathan Corbet 171*ff61f079SJonathan CorbetDrivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges. 172*ff61f079SJonathan Corbet 173*ff61f079SJonathan Corbet 174*ff61f079SJonathan CorbetPAT debugging 175*ff61f079SJonathan Corbet============= 176*ff61f079SJonathan Corbet 177*ff61f079SJonathan CorbetWith CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by:: 178*ff61f079SJonathan Corbet 179*ff61f079SJonathan Corbet # mount -t debugfs debugfs /sys/kernel/debug 180*ff61f079SJonathan Corbet # cat /sys/kernel/debug/x86/pat_memtype_list 181*ff61f079SJonathan Corbet PAT memtype list: 182*ff61f079SJonathan Corbet uncached-minus @ 0x7fadf000-0x7fae0000 183*ff61f079SJonathan Corbet uncached-minus @ 0x7fb19000-0x7fb1a000 184*ff61f079SJonathan Corbet uncached-minus @ 0x7fb1a000-0x7fb1b000 185*ff61f079SJonathan Corbet uncached-minus @ 0x7fb1b000-0x7fb1c000 186*ff61f079SJonathan Corbet uncached-minus @ 0x7fb1c000-0x7fb1d000 187*ff61f079SJonathan Corbet uncached-minus @ 0x7fb1d000-0x7fb1e000 188*ff61f079SJonathan Corbet uncached-minus @ 0x7fb1e000-0x7fb25000 189*ff61f079SJonathan Corbet uncached-minus @ 0x7fb25000-0x7fb26000 190*ff61f079SJonathan Corbet uncached-minus @ 0x7fb26000-0x7fb27000 191*ff61f079SJonathan Corbet uncached-minus @ 0x7fb27000-0x7fb28000 192*ff61f079SJonathan Corbet uncached-minus @ 0x7fb28000-0x7fb2e000 193*ff61f079SJonathan Corbet uncached-minus @ 0x7fb2e000-0x7fb2f000 194*ff61f079SJonathan Corbet uncached-minus @ 0x7fb2f000-0x7fb30000 195*ff61f079SJonathan Corbet uncached-minus @ 0x7fb31000-0x7fb32000 196*ff61f079SJonathan Corbet uncached-minus @ 0x80000000-0x90000000 197*ff61f079SJonathan Corbet 198*ff61f079SJonathan CorbetThis list shows physical address ranges and various PAT settings used to 199*ff61f079SJonathan Corbetaccess those physical address ranges. 200*ff61f079SJonathan Corbet 201*ff61f079SJonathan CorbetAnother, more verbose way of getting PAT related debug messages is with 202*ff61f079SJonathan Corbet"debugpat" boot parameter. With this parameter, various debug messages are 203*ff61f079SJonathan Corbetprinted to dmesg log. 204*ff61f079SJonathan Corbet 205*ff61f079SJonathan CorbetPAT Initialization 206*ff61f079SJonathan Corbet================== 207*ff61f079SJonathan Corbet 208*ff61f079SJonathan CorbetThe following table describes how PAT is initialized under various 209*ff61f079SJonathan Corbetconfigurations. The PAT MSR must be updated by Linux in order to support WC 210*ff61f079SJonathan Corbetand WT attributes. Otherwise, the PAT MSR has the value programmed in it 211*ff61f079SJonathan Corbetby the firmware. Note, Xen enables WC attribute in the PAT MSR for guests. 212*ff61f079SJonathan Corbet 213*ff61f079SJonathan Corbet ==== ===== ========================== ========= ======= 214*ff61f079SJonathan Corbet MTRR PAT Call Sequence PAT State PAT MSR 215*ff61f079SJonathan Corbet ==== ===== ========================== ========= ======= 216*ff61f079SJonathan Corbet E E MTRR -> PAT init Enabled OS 217*ff61f079SJonathan Corbet E D MTRR -> PAT init Disabled - 218*ff61f079SJonathan Corbet D E MTRR -> PAT disable Disabled BIOS 219*ff61f079SJonathan Corbet D D MTRR -> PAT disable Disabled - 220*ff61f079SJonathan Corbet - np/E PAT -> PAT disable Disabled BIOS 221*ff61f079SJonathan Corbet - np/D PAT -> PAT disable Disabled - 222*ff61f079SJonathan Corbet E !P/E MTRR -> PAT init Disabled BIOS 223*ff61f079SJonathan Corbet D !P/E MTRR -> PAT disable Disabled BIOS 224*ff61f079SJonathan Corbet !M !P/E MTRR stub -> PAT disable Disabled BIOS 225*ff61f079SJonathan Corbet ==== ===== ========================== ========= ======= 226*ff61f079SJonathan Corbet 227*ff61f079SJonathan Corbet Legend 228*ff61f079SJonathan Corbet 229*ff61f079SJonathan Corbet ========= ======================================= 230*ff61f079SJonathan Corbet E Feature enabled in CPU 231*ff61f079SJonathan Corbet D Feature disabled/unsupported in CPU 232*ff61f079SJonathan Corbet np "nopat" boot option specified 233*ff61f079SJonathan Corbet !P CONFIG_X86_PAT option unset 234*ff61f079SJonathan Corbet !M CONFIG_MTRR option unset 235*ff61f079SJonathan Corbet Enabled PAT state set to enabled 236*ff61f079SJonathan Corbet Disabled PAT state set to disabled 237*ff61f079SJonathan Corbet OS PAT initializes PAT MSR with OS setting 238*ff61f079SJonathan Corbet BIOS PAT keeps PAT MSR with BIOS setting 239*ff61f079SJonathan Corbet ========= ======================================= 240*ff61f079SJonathan Corbet 241