1Intel hybrid support 2-------------------- 3Support for Intel hybrid events within perf tools. 4 5For some Intel platforms, such as AlderLake, which is hybrid platform and 6it consists of atom cpu and core cpu. Each cpu has dedicated event list. 7Part of events are available on core cpu, part of events are available 8on atom cpu and even part of events are available on both. 9 10Kernel exports two new cpu pmus via sysfs: 11/sys/devices/cpu_core 12/sys/devices/cpu_atom 13 14The 'cpus' files are created under the directories. For example, 15 16cat /sys/devices/cpu_core/cpus 170-15 18 19cat /sys/devices/cpu_atom/cpus 2016-23 21 22It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus. 23 24Quickstart 25 26List hybrid event 27----------------- 28 29As before, use perf-list to list the symbolic event. 30 31perf list 32 33inst_retired.any 34 [Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom] 35inst_retired.any 36 [Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core] 37 38The 'Unit: xxx' is added to brief description to indicate which pmu 39the event is belong to. Same event name but with different pmu can 40be supported. 41 42Enable hybrid event with a specific pmu 43--------------------------------------- 44 45To enable a core only event or atom only event, following syntax is supported: 46 47 cpu_core/<event name>/ 48or 49 cpu_atom/<event name>/ 50 51For example, count the 'cycles' event on core cpus. 52 53 perf stat -e cpu_core/cycles/ 54 55Create two events for one hardware event automatically 56------------------------------------------------------ 57 58When creating one event and the event is available on both atom and core, 59two events are created automatically. One is for atom, the other is for 60core. Most of hardware events and cache events are available on both 61cpu_core and cpu_atom. 62 63For hardware events, they have pre-defined configs (e.g. 0 for cycles). 64But on hybrid platform, kernel needs to know where the event comes from 65(from atom or from core). The original perf event type PERF_TYPE_HARDWARE 66can't carry pmu information. So now this type is extended to be PMU aware 67type. The PMU type ID is stored at attr.config[63:32]. 68 69PMU type ID is retrieved from sysfs. 70/sys/devices/cpu_atom/type 71/sys/devices/cpu_core/type 72 73The new attr.config layout for PERF_TYPE_HARDWARE: 74 75PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA 76 AA: hardware event ID 77 EEEEEEEE: PMU type ID 78 79Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be 80PMU aware type. The PMU type ID is stored at attr.config[63:32]. 81 82The new attr.config layout for PERF_TYPE_HW_CACHE: 83 84PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB 85 BB: hardware cache ID 86 CC: hardware cache op ID 87 DD: hardware cache op result ID 88 EEEEEEEE: PMU type ID 89 90When enabling a hardware event without specified pmu, such as, 91perf stat -e cycles -a (use system-wide in this example), two events 92are created automatically. 93 94 ------------------------------------------------------------ 95 perf_event_attr: 96 size 120 97 config 0x400000000 98 sample_type IDENTIFIER 99 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 100 disabled 1 101 inherit 1 102 exclude_guest 1 103 ------------------------------------------------------------ 104 105and 106 107 ------------------------------------------------------------ 108 perf_event_attr: 109 size 120 110 config 0x800000000 111 sample_type IDENTIFIER 112 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 113 disabled 1 114 inherit 1 115 exclude_guest 1 116 ------------------------------------------------------------ 117 118type 0 is PERF_TYPE_HARDWARE. 1190x4 in 0x400000000 indicates it's cpu_core pmu. 1200x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random). 121 122The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus), 123and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus). 124 125For perf-stat result, it displays two events: 126 127 Performance counter stats for 'system wide': 128 129 6,744,979 cpu_core/cycles/ 130 1,965,552 cpu_atom/cycles/ 131 132The first 'cycles' is core event, the second 'cycles' is atom event. 133 134Thread mode example: 135-------------------- 136 137perf-stat reports the scaled counts for hybrid event and with a percentage 138displayed. The percentage is the event's running time/enabling time. 139 140One example, 'triad_loop' runs on cpu16 (atom core), while we can see the 141scaled value for core cycles is 160,444,092 and the percentage is 0.47%. 142 143perf stat -e cycles -- taskset -c 16 ./triad_loop 144 145As previous, two events are created. 146 147------------------------------------------------------------ 148perf_event_attr: 149 size 120 150 config 0x400000000 151 sample_type IDENTIFIER 152 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 153 disabled 1 154 inherit 1 155 enable_on_exec 1 156 exclude_guest 1 157------------------------------------------------------------ 158 159and 160 161------------------------------------------------------------ 162perf_event_attr: 163 size 120 164 config 0x800000000 165 sample_type IDENTIFIER 166 read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING 167 disabled 1 168 inherit 1 169 enable_on_exec 1 170 exclude_guest 1 171------------------------------------------------------------ 172 173 Performance counter stats for 'taskset -c 16 ./triad_loop': 174 175 233,066,666 cpu_core/cycles/ (0.43%) 176 604,097,080 cpu_atom/cycles/ (99.57%) 177 178perf-record: 179------------ 180 181If there is no '-e' specified in perf record, on hybrid platform, 182it creates two default 'cycles' and adds them to event list. One 183is for core, the other is for atom. 184 185perf-stat: 186---------- 187 188If there is no '-e' specified in perf stat, on hybrid platform, 189besides of software events, following events are created and 190added to event list in order. 191 192cpu_core/cycles/, 193cpu_atom/cycles/, 194cpu_core/instructions/, 195cpu_atom/instructions/, 196cpu_core/branches/, 197cpu_atom/branches/, 198cpu_core/branch-misses/, 199cpu_atom/branch-misses/ 200 201Of course, both perf-stat and perf-record support to enable 202hybrid event with a specific pmu. 203 204e.g. 205perf stat -e cpu_core/cycles/ 206perf stat -e cpu_atom/cycles/ 207perf stat -e cpu_core/r1a/ 208perf stat -e cpu_atom/L1-icache-loads/ 209perf stat -e cpu_core/cycles/,cpu_atom/instructions/ 210perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}' 211 212But '{cpu_core/cycles/,cpu_atom/instructions/}' will return 213warning and disable grouping, because the pmus in group are 214not matched (cpu_core vs. cpu_atom). 215