x86/haswell/hsw-metrics.json

3         "BriefDescription": "C2 residency percent per package",
4         "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
10         "BriefDescription": "C3 residency percent per core",
11         "MetricExpr": "cstate_core@c3\\-residency@ / TSC",
17         "BriefDescription": "C3 residency percent per package",
18         "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
24         "BriefDescription": "C6 residency percent per core",
25         "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
31         "BriefDescription": "C6 residency percent per package",
32         "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
38         "BriefDescription": "C7 residency percent per core",
39         "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
45         "BriefDescription": "C7 residency percent per package",
46         "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
59         "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
78 …sible; which incur a few cycles load re-issue. However; the short re-issue duration is often hidde…
96 …er-cases for operations that cannot be handled natively by the execution pipeline. For example; wh…
102         "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma_retiring)",
107 …-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
112 …"MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / …
117 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
128 …etched from an incorrectly speculated program path; or stalls when the out-of-order part of the ma…
137 … corrected path; following all sorts of miss-predicted branches. For example; branchy code with lo…
143         "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
147 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
161 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
163         "MetricExpr": "tma_backend_bound - tma_memory_bound",
168 …-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in s…
172 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
178 … cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Dat…
193 …"MetricExpr": "(1 - MEM_LOAD_UOPS_RETIRED.L3_HIT / (MEM_LOAD_UOPS_RETIRED.L3_HIT + 7 * MEM_LOAD_UO…
202 …"MetricExpr": "(IDQ.ALL_DSB_CYCLES_ANY_UOPS - IDQ.ALL_DSB_CYCLES_4_UOPS) / tma_info_core_core_clks…
215 …o switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-en…
224 …-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
228 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
233 …-level data TLB store misses.  As with ordinary data caching; focus on improving data locality and…
242 …hreading hiccup; where multiple Logical Processors contend on different data-elements mapped into …
257         "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
267 …"MetricExpr": "4 * min(CPU_CLK_UNHALTED.THREAD, IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE) /…
272 …he CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB mi…
282 …-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
286 … slots where the CPU was retiring heavy-weight operations -- instructions that require two or more…
292 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro…
311 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
324         "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
330 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
399         "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
405         "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
411         "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
435 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
442 …"BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is…
447 …ublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
468 …      "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
474         "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
480         "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
486         "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
558 …"MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #S…
575 …   "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
593 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
623 …"MetricExpr": "max((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING) - CYCLE_ACTIVI…
627 … TLB. These cases are characterized by execution unit stalls; while some non-completed demand load…
632 …"MetricExpr": "(CYCLE_ACTIVITY.STALLS_L1D_PENDING - CYCLE_ACTIVITY.STALLS_L2_PENDING) / tma_info_t…
669 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
670         "MetricExpr": "tma_retiring - tma_heavy_operations",
675 …-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
681 …HED_PORT.PORT_2 + UOPS_DISPATCHED_PORT.PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT…
691 …"MetricExpr": "MEM_UOPS_RETIRED.LOCK_LOADS / MEM_UOPS_RETIRED.ALL_STORES * min(CPU_CLK_UNHALTED.TH…
701         "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
706 …-of-order portion of the machine needs to recover its state after the clear. For example; this can…
711 …"MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, cpu@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD\\,cmask\\…
715 …-core traffic is generated by all IA cores. This metric does not aggregate non-data-read requests …
720 …"MetricExpr": "min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tm…
730 …min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING) + RESOURCE_STALLS.SB) / (min(CPU_C…
735 …o demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory d…
749 …"MetricExpr": "(IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS) / tma_info_core_core_cl…
753 …the legacy decode pipeline). This pipeline is used for code that was not pre-cached in the DSB or …
762 … Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legac…
784 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
789 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
793 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
798 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
802 …is metric represents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data)",
807 …sents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data). Sample with: U…
829 …ents Core fraction of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address)",
834 …action of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address). Sample with…
838 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
840 …min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) + (cpu@UOPS_EXECUTED.CORE\\,cmask\\…
844 …-related).  Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
849 …ORE\\,inv\\,cmask\\=1@ / 2 if #SMT_on else (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_…
853 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
858 …_EXECUTED.CORE\\,cmask\\=1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=2@) / 2 if #SMT_on else (cpu@UOPS_E…
862 …-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
867 …_EXECUTED.CORE\\,cmask\\=2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3@) / 2 if #SMT_on else (cpu@UOPS_E…
871 …cal Core cycles otherwise).  Loop Vectorization -most compilers feature auto-Vectorization options…
889 …ions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is …
893 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
899 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
908 …resents rate of split store accesses.  Consider aligning your data to the 64-byte cache line granu…
912 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
917 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
921 … CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
926 …ses; RFO store issue a read-for-ownership request before the write. Even though store accesses do …
935 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
941 … * 9 * (1 - MEM_UOPS_RETIRED.LOCK_LOADS / MEM_UOPS_RETIRED.ALL_STORES) + (1 - MEM_UOPS_RETIRED.LOC…
945 …-of-order core performance; however; holding resources for longer time can lead into undesired imp…