f454a54f | 29-Jan-2019 |
Peter Maydell <peter.maydell@linaro.org> |
accel/tcg/user-exec: Don't parse aarch64 insns to test for read vs write
In cpu_signal_handler() for aarch64 hosts, currently we parse the faulting instruction to see if it is a load or a store. Sin
accel/tcg/user-exec: Don't parse aarch64 insns to test for read vs write
In cpu_signal_handler() for aarch64 hosts, currently we parse the faulting instruction to see if it is a load or a store. Since the 3.16 kernel (~2014), the kernel has provided us with the syndrome register for a fault, which includes the WnR bit. Use this instead if it is present, only falling back to instruction parsing if not.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20190108180014.32386-1-peter.maydell@linaro.org
show more ...
|
3cea94bb | 16-Jan-2019 |
Emilio G. Cota <cota@braap.org> |
cputlb: do not evict empty entries to the vtlb
Currently we evict an entry to the victim TLB when it doesn't match the current address. But it could be that there's no match because the current entr
cputlb: do not evict empty entries to the vtlb
Currently we evict an entry to the victim TLB when it doesn't match the current address. But it could be that there's no match because the current entry is empty (i.e. all -1's, for instance via tlb_flush). Do not evict the entry to the vtlb in that case.
This change will help us keep track of the TLB's use rate, which we'll use to implement a policy for dynamic TLB sizing.
Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-2-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
ab651105 | 23-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Remove tlb_c.pending_flushes
This is essentially redundant with tlb_c.dirty.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard H
cputlb: Remove tlb_c.pending_flushes
This is essentially redundant with tlb_c.dirty.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
3d1523ce | 20-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Filter flushes on already clean tlbs
Especially for guests with large numbers of tlbs, like ARM or PPC, we may well not use all of them in between flush operations. Remember which tlbs have
cputlb: Filter flushes on already clean tlbs
Especially for guests with large numbers of tlbs, like ARM or PPC, we may well not use all of them in between flush operations. Remember which tlbs have been used since the last flush, and avoid any useless flushing.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
e09de0a2 | 19-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Count "partial" and "elided" tlb flushes
Our only statistic so far was "full" tlb flushes, where all mmu_idx are flushed at the same time.
Now count "partial" tlb flushes where sets of mmu_
cputlb: Count "partial" and "elided" tlb flushes
Our only statistic so far was "full" tlb flushes, where all mmu_idx are flushed at the same time.
Now count "partial" tlb flushes where sets of mmu_idx are flushed, but the set is not maximal. Account one per mmu_idx flushed, as that is the unit of work performed.
We don't actually count elided flushes yet, but go ahead and change the interface presented to the monitor all at once.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
f8144c6c | 19-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx
The difference between the two sets of APIs is now miniscule.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota
cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx
The difference between the two sets of APIs is now miniscule.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
64f2674b | 23-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work
The difference between the two sets of APIs is now miniscule.
This allows tlb_flush, tlb_flush_all_cpus, and tlb_flush_all_cpus_s
cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work
The difference between the two sets of APIs is now miniscule.
This allows tlb_flush, tlb_flush_all_cpus, and tlb_flush_all_cpus_synced to be merged with their corresponding by_mmuidx functions as well. For accounting, consider mmu_idx_bitmask = ALL_MMUIDX_BITS to be a full flush.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
d5363e58 | 19-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Move env->vtlb_index to env->tlb_d.vindex
The rest of the tlb victim cache is per-tlb, the next use index should be as well.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G
cputlb: Move env->vtlb_index to env->tlb_d.vindex
The rest of the tlb victim cache is per-tlb, the next use index should be as well.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
1308e026 | 17-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Split large page tracking per mmu_idx
The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush
cputlb: Split large page tracking per mmu_idx
The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush more often than necessary.
This allows tlb_flush_page_async_work to flush just the one mmu_idx implicated, which in turn allows us to remove tlb_check_page_and_flush_by_mmuidx_async_work.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
60a2ad7d | 20-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush
Protect it with the tlb_lock instead of using atomics. The move puts it in or near the same cacheline as the lock; using the lock mean
cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush
Protect it with the tlb_lock instead of using atomics. The move puts it in or near the same cacheline as the lock; using the lock means we don't need a second atomic operation in order to perform the update. Which makes it cheap to also update pending_flush in tlb_flush_by_mmuidx_async_work.
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
8ab10266 | 20-Oct-2018 |
Richard Henderson <richard.henderson@linaro.org> |
cputlb: Remove tcg_enabled hack from tlb_flush_nocheck
The bugs this was working around were fixed with commits 022d6378c7fd target/unicore32: remove tlb_flush from uc32_init_fn 6e11beecfde0 targe
cputlb: Remove tcg_enabled hack from tlb_flush_nocheck
The bugs this was working around were fixed with commits 022d6378c7fd target/unicore32: remove tlb_flush from uc32_init_fn 6e11beecfde0 target/alpha: remove tlb_flush from alpha_cpu_initfn
Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
403f290c | 16-Oct-2018 |
Emilio G. Cota <cota@braap.org> |
cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB).
This completes the c
cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not take tlb_lock must use atomic_read to avoid undefined behaviour (UB).
This completes the conversion to tlb_lock. This conversion results on average in no performance loss, as the following experiments (run on an Intel i7-6700K CPU @ 4.00GHz) show.
1. aarch64 bootup+shutdown test:
- Before: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):
7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% ) 31,574,905,303 cycles # 4.217 GHz ( +- 0.12% ) 57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% ) 10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% ) 173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% )
7.504481349 seconds time elapsed ( +- 0.14% )
- After: Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):
7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% ) 31,478,476,520 cycles # 4.218 GHz ( +- 0.07% ) 57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% ) 10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% ) 173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% )
7.474970463 seconds time elapsed ( +- 0.07% )
2. SPEC06int: SPEC06int (test set) [Y axis: Speedup over master] 1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+ | | 1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+ | +++ | +++ tlb-lock-v3 (spinl|ck) | | +++ | | +++ +++ | | | 1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+ | ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### | 1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+ | *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # | 0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+ | * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # | | * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # | 0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+ | * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # | 0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+ | * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # | | * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # | 0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+ | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+ 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean
png: https://imgur.com/a/BHzpPTW
Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs.
Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181016153840.25877-1-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|
e6cd4bb5 | 15-Aug-2018 |
Richard Henderson <richard.henderson@linaro.org> |
tcg: Split CONFIG_ATOMIC128
GCC7+ will no longer advertise support for 16-byte __atomic operations if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still has support for __sync_comp
tcg: Split CONFIG_ATOMIC128
GCC7+ will no longer advertise support for 16-byte __atomic operations if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still has support for __sync_compare_and_swap_16 and we can make use of that. AArch64 does not have, nor ever has had such support, so open-code it.
Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
show more ...
|