xref: /openbmc/linux/Documentation/arch/x86/tlb.rst (revision 1ac731c529cd4d6adbce134754b51ff7d822b145)
1*ff61f079SJonathan Corbet.. SPDX-License-Identifier: GPL-2.0
2*ff61f079SJonathan Corbet
3*ff61f079SJonathan Corbet=======
4*ff61f079SJonathan CorbetThe TLB
5*ff61f079SJonathan Corbet=======
6*ff61f079SJonathan Corbet
7*ff61f079SJonathan CorbetWhen the kernel unmaps or modified the attributes of a range of
8*ff61f079SJonathan Corbetmemory, it has two choices:
9*ff61f079SJonathan Corbet
10*ff61f079SJonathan Corbet 1. Flush the entire TLB with a two-instruction sequence.  This is
11*ff61f079SJonathan Corbet    a quick operation, but it causes collateral damage: TLB entries
12*ff61f079SJonathan Corbet    from areas other than the one we are trying to flush will be
13*ff61f079SJonathan Corbet    destroyed and must be refilled later, at some cost.
14*ff61f079SJonathan Corbet 2. Use the invlpg instruction to invalidate a single page at a
15*ff61f079SJonathan Corbet    time.  This could potentially cost many more instructions, but
16*ff61f079SJonathan Corbet    it is a much more precise operation, causing no collateral
17*ff61f079SJonathan Corbet    damage to other TLB entries.
18*ff61f079SJonathan Corbet
19*ff61f079SJonathan CorbetWhich method to do depends on a few things:
20*ff61f079SJonathan Corbet
21*ff61f079SJonathan Corbet 1. The size of the flush being performed.  A flush of the entire
22*ff61f079SJonathan Corbet    address space is obviously better performed by flushing the
23*ff61f079SJonathan Corbet    entire TLB than doing 2^48/PAGE_SIZE individual flushes.
24*ff61f079SJonathan Corbet 2. The contents of the TLB.  If the TLB is empty, then there will
25*ff61f079SJonathan Corbet    be no collateral damage caused by doing the global flush, and
26*ff61f079SJonathan Corbet    all of the individual flush will have ended up being wasted
27*ff61f079SJonathan Corbet    work.
28*ff61f079SJonathan Corbet 3. The size of the TLB.  The larger the TLB, the more collateral
29*ff61f079SJonathan Corbet    damage we do with a full flush.  So, the larger the TLB, the
30*ff61f079SJonathan Corbet    more attractive an individual flush looks.  Data and
31*ff61f079SJonathan Corbet    instructions have separate TLBs, as do different page sizes.
32*ff61f079SJonathan Corbet 4. The microarchitecture.  The TLB has become a multi-level
33*ff61f079SJonathan Corbet    cache on modern CPUs, and the global flushes have become more
34*ff61f079SJonathan Corbet    expensive relative to single-page flushes.
35*ff61f079SJonathan Corbet
36*ff61f079SJonathan CorbetThere is obviously no way the kernel can know all these things,
37*ff61f079SJonathan Corbetespecially the contents of the TLB during a given flush.  The
38*ff61f079SJonathan Corbetsizes of the flush will vary greatly depending on the workload as
39*ff61f079SJonathan Corbetwell.  There is essentially no "right" point to choose.
40*ff61f079SJonathan Corbet
41*ff61f079SJonathan CorbetYou may be doing too many individual invalidations if you see the
42*ff61f079SJonathan Corbetinvlpg instruction (or instructions _near_ it) show up high in
43*ff61f079SJonathan Corbetprofiles.  If you believe that individual invalidations being
44*ff61f079SJonathan Corbetcalled too often, you can lower the tunable::
45*ff61f079SJonathan Corbet
46*ff61f079SJonathan Corbet	/sys/kernel/debug/x86/tlb_single_page_flush_ceiling
47*ff61f079SJonathan Corbet
48*ff61f079SJonathan CorbetThis will cause us to do the global flush for more cases.
49*ff61f079SJonathan CorbetLowering it to 0 will disable the use of the individual flushes.
50*ff61f079SJonathan CorbetSetting it to 1 is a very conservative setting and it should
51*ff61f079SJonathan Corbetnever need to be 0 under normal circumstances.
52*ff61f079SJonathan Corbet
53*ff61f079SJonathan CorbetDespite the fact that a single individual flush on x86 is
54*ff61f079SJonathan Corbetguaranteed to flush a full 2MB [1]_, hugetlbfs always uses the full
55*ff61f079SJonathan Corbetflushes.  THP is treated exactly the same as normal memory.
56*ff61f079SJonathan Corbet
57*ff61f079SJonathan CorbetYou might see invlpg inside of flush_tlb_mm_range() show up in
58*ff61f079SJonathan Corbetprofiles, or you can use the trace_tlb_flush() tracepoints. to
59*ff61f079SJonathan Corbetdetermine how long the flush operations are taking.
60*ff61f079SJonathan Corbet
61*ff61f079SJonathan CorbetEssentially, you are balancing the cycles you spend doing invlpg
62*ff61f079SJonathan Corbetwith the cycles that you spend refilling the TLB later.
63*ff61f079SJonathan Corbet
64*ff61f079SJonathan CorbetYou can measure how expensive TLB refills are by using
65*ff61f079SJonathan Corbetperformance counters and 'perf stat', like this::
66*ff61f079SJonathan Corbet
67*ff61f079SJonathan Corbet  perf stat -e
68*ff61f079SJonathan Corbet    cpu/event=0x8,umask=0x84,name=dtlb_load_misses_walk_duration/,
69*ff61f079SJonathan Corbet    cpu/event=0x8,umask=0x82,name=dtlb_load_misses_walk_completed/,
70*ff61f079SJonathan Corbet    cpu/event=0x49,umask=0x4,name=dtlb_store_misses_walk_duration/,
71*ff61f079SJonathan Corbet    cpu/event=0x49,umask=0x2,name=dtlb_store_misses_walk_completed/,
72*ff61f079SJonathan Corbet    cpu/event=0x85,umask=0x4,name=itlb_misses_walk_duration/,
73*ff61f079SJonathan Corbet    cpu/event=0x85,umask=0x2,name=itlb_misses_walk_completed/
74*ff61f079SJonathan Corbet
75*ff61f079SJonathan CorbetThat works on an IvyBridge-era CPU (i5-3320M).  Different CPUs
76*ff61f079SJonathan Corbetmay have differently-named counters, but they should at least
77*ff61f079SJonathan Corbetbe there in some form.  You can use pmu-tools 'ocperf list'
78*ff61f079SJonathan Corbet(https://github.com/andikleen/pmu-tools) to find the right
79*ff61f079SJonathan Corbetcounters for a given CPU.
80*ff61f079SJonathan Corbet
81*ff61f079SJonathan Corbet.. [1] A footnote in Intel's SDM "4.10.4.2 Recommended Invalidation"
82*ff61f079SJonathan Corbet   says: "One execution of INVLPG is sufficient even for a page
83*ff61f079SJonathan Corbet   with size greater than 4 KBytes."
84