1*ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0 2*ff61f079SJonathan Corbet 3*ff61f079SJonathan Corbet======= 4*ff61f079SJonathan CorbetThe TLB 5*ff61f079SJonathan Corbet======= 6*ff61f079SJonathan Corbet 7*ff61f079SJonathan CorbetWhen the kernel unmaps or modified the attributes of a range of 8*ff61f079SJonathan Corbetmemory, it has two choices: 9*ff61f079SJonathan Corbet 10*ff61f079SJonathan Corbet 1. Flush the entire TLB with a two-instruction sequence. This is 11*ff61f079SJonathan Corbet a quick operation, but it causes collateral damage: TLB entries 12*ff61f079SJonathan Corbet from areas other than the one we are trying to flush will be 13*ff61f079SJonathan Corbet destroyed and must be refilled later, at some cost. 14*ff61f079SJonathan Corbet 2. Use the invlpg instruction to invalidate a single page at a 15*ff61f079SJonathan Corbet time. This could potentially cost many more instructions, but 16*ff61f079SJonathan Corbet it is a much more precise operation, causing no collateral 17*ff61f079SJonathan Corbet damage to other TLB entries. 18*ff61f079SJonathan Corbet 19*ff61f079SJonathan CorbetWhich method to do depends on a few things: 20*ff61f079SJonathan Corbet 21*ff61f079SJonathan Corbet 1. The size of the flush being performed. A flush of the entire 22*ff61f079SJonathan Corbet address space is obviously better performed by flushing the 23*ff61f079SJonathan Corbet entire TLB than doing 2^48/PAGE_SIZE individual flushes. 24*ff61f079SJonathan Corbet 2. The contents of the TLB. If the TLB is empty, then there will 25*ff61f079SJonathan Corbet be no collateral damage caused by doing the global flush, and 26*ff61f079SJonathan Corbet all of the individual flush will have ended up being wasted 27*ff61f079SJonathan Corbet work. 28*ff61f079SJonathan Corbet 3. The size of the TLB. The larger the TLB, the more collateral 29*ff61f079SJonathan Corbet damage we do with a full flush. So, the larger the TLB, the 30*ff61f079SJonathan Corbet more attractive an individual flush looks. Data and 31*ff61f079SJonathan Corbet instructions have separate TLBs, as do different page sizes. 32*ff61f079SJonathan Corbet 4. The microarchitecture. The TLB has become a multi-level 33*ff61f079SJonathan Corbet cache on modern CPUs, and the global flushes have become more 34*ff61f079SJonathan Corbet expensive relative to single-page flushes. 35*ff61f079SJonathan Corbet 36*ff61f079SJonathan CorbetThere is obviously no way the kernel can know all these things, 37*ff61f079SJonathan Corbetespecially the contents of the TLB during a given flush. The 38*ff61f079SJonathan Corbetsizes of the flush will vary greatly depending on the workload as 39*ff61f079SJonathan Corbetwell. There is essentially no "right" point to choose. 40*ff61f079SJonathan Corbet 41*ff61f079SJonathan CorbetYou may be doing too many individual invalidations if you see the 42*ff61f079SJonathan Corbetinvlpg instruction (or instructions _near_ it) show up high in 43*ff61f079SJonathan Corbetprofiles. If you believe that individual invalidations being 44*ff61f079SJonathan Corbetcalled too often, you can lower the tunable:: 45*ff61f079SJonathan Corbet 46*ff61f079SJonathan Corbet /sys/kernel/debug/x86/tlb_single_page_flush_ceiling 47*ff61f079SJonathan Corbet 48*ff61f079SJonathan CorbetThis will cause us to do the global flush for more cases. 49*ff61f079SJonathan CorbetLowering it to 0 will disable the use of the individual flushes. 50*ff61f079SJonathan CorbetSetting it to 1 is a very conservative setting and it should 51*ff61f079SJonathan Corbetnever need to be 0 under normal circumstances. 52*ff61f079SJonathan Corbet 53*ff61f079SJonathan CorbetDespite the fact that a single individual flush on x86 is 54*ff61f079SJonathan Corbetguaranteed to flush a full 2MB [1]_, hugetlbfs always uses the full 55*ff61f079SJonathan Corbetflushes. THP is treated exactly the same as normal memory. 56*ff61f079SJonathan Corbet 57*ff61f079SJonathan CorbetYou might see invlpg inside of flush_tlb_mm_range() show up in 58*ff61f079SJonathan Corbetprofiles, or you can use the trace_tlb_flush() tracepoints. to 59*ff61f079SJonathan Corbetdetermine how long the flush operations are taking. 60*ff61f079SJonathan Corbet 61*ff61f079SJonathan CorbetEssentially, you are balancing the cycles you spend doing invlpg 62*ff61f079SJonathan Corbetwith the cycles that you spend refilling the TLB later. 63*ff61f079SJonathan Corbet 64*ff61f079SJonathan CorbetYou can measure how expensive TLB refills are by using 65*ff61f079SJonathan Corbetperformance counters and 'perf stat', like this:: 66*ff61f079SJonathan Corbet 67*ff61f079SJonathan Corbet perf stat -e 68*ff61f079SJonathan Corbet cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/, 69*ff61f079SJonathan Corbet cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/, 70*ff61f079SJonathan Corbet cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/, 71*ff61f079SJonathan Corbet cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/, 72*ff61f079SJonathan Corbet cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/, 73*ff61f079SJonathan Corbet cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/ 74*ff61f079SJonathan Corbet 75*ff61f079SJonathan CorbetThat works on an IvyBridge-era CPU (i5-3320M). Different CPUs 76*ff61f079SJonathan Corbetmay have differently-named counters, but they should at least 77*ff61f079SJonathan Corbetbe there in some form. You can use pmu-tools 'ocperf list' 78*ff61f079SJonathan Corbet(https://github.com/andikleen/pmu-tools) to find the right 79*ff61f079SJonathan Corbetcounters for a given CPU. 80*ff61f079SJonathan Corbet 81*ff61f079SJonathan Corbet.. [1] A footnote in Intel's SDM "4.10.4.2 Recommended Invalidation" 82*ff61f079SJonathan Corbet says: "One execution of INVLPG is sufficient even for a page 83*ff61f079SJonathan Corbet with size greater than 4 KBytes." 84