1*1a3ec143SAnju T Sudhakar.. SPDX-License-Identifier: GPL-2.0 2*1a3ec143SAnju T Sudhakar.. _imc: 3*1a3ec143SAnju T Sudhakar 4*1a3ec143SAnju T Sudhakar=================================== 5*1a3ec143SAnju T SudhakarIMC (In-Memory Collection Counters) 6*1a3ec143SAnju T Sudhakar=================================== 7*1a3ec143SAnju T Sudhakar 8*1a3ec143SAnju T SudhakarAnju T Sudhakar, 10 May 2019 9*1a3ec143SAnju T Sudhakar 10*1a3ec143SAnju T Sudhakar.. contents:: 11*1a3ec143SAnju T Sudhakar :depth: 3 12*1a3ec143SAnju T Sudhakar 13*1a3ec143SAnju T Sudhakar 14*1a3ec143SAnju T SudhakarBasic overview 15*1a3ec143SAnju T Sudhakar============== 16*1a3ec143SAnju T Sudhakar 17*1a3ec143SAnju T SudhakarIMC (In-Memory collection counters) is a hardware monitoring facility that 18*1a3ec143SAnju T Sudhakarcollects large numbers of hardware performance events at Nest level (these are 19*1a3ec143SAnju T Sudhakaron-chip but off-core), Core level and Thread level. 20*1a3ec143SAnju T Sudhakar 21*1a3ec143SAnju T SudhakarThe Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC 22*1a3ec143SAnju T Sudhakar(On-Chip Controller) complex. The microcode collects the counter data and moves 23*1a3ec143SAnju T Sudhakarthe nest IMC counter data to memory. 24*1a3ec143SAnju T Sudhakar 25*1a3ec143SAnju T SudhakarThe Core and Thread IMC PMU counters are handled in the core. Core level PMU 26*1a3ec143SAnju T Sudhakarcounters give us the IMC counters' data per core and thread level PMU counters 27*1a3ec143SAnju T Sudhakargive us the IMC counters' data per CPU thread. 28*1a3ec143SAnju T Sudhakar 29*1a3ec143SAnju T SudhakarOPAL obtains the IMC PMU and supported events information from the IMC Catalog 30*1a3ec143SAnju T Sudhakarand passes on to the kernel via the device tree. The event's information 31*1a3ec143SAnju T Sudhakarcontains: 32*1a3ec143SAnju T Sudhakar 33*1a3ec143SAnju T Sudhakar- Event name 34*1a3ec143SAnju T Sudhakar- Event Offset 35*1a3ec143SAnju T Sudhakar- Event description 36*1a3ec143SAnju T Sudhakar 37*1a3ec143SAnju T Sudhakarand possibly also: 38*1a3ec143SAnju T Sudhakar 39*1a3ec143SAnju T Sudhakar- Event scale 40*1a3ec143SAnju T Sudhakar- Event unit 41*1a3ec143SAnju T Sudhakar 42*1a3ec143SAnju T SudhakarSome PMUs may have a common scale and unit values for all their supported 43*1a3ec143SAnju T Sudhakarevents. For those cases, the scale and unit properties for those events must be 44*1a3ec143SAnju T Sudhakarinherited from the PMU. 45*1a3ec143SAnju T Sudhakar 46*1a3ec143SAnju T SudhakarThe event offset in the memory is where the counter data gets accumulated. 47*1a3ec143SAnju T Sudhakar 48*1a3ec143SAnju T SudhakarIMC catalog is available at: 49*1a3ec143SAnju T Sudhakar https://github.com/open-power/ima-catalog 50*1a3ec143SAnju T Sudhakar 51*1a3ec143SAnju T SudhakarThe kernel discovers the IMC counters information in the device tree at the 52*1a3ec143SAnju T Sudhakar`imc-counters` device node which has a compatible field 53*1a3ec143SAnju T Sudhakar`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs 54*1a3ec143SAnju T Sudhakarand their event's information and register the PMU and its attributes in the 55*1a3ec143SAnju T Sudhakarkernel. 56*1a3ec143SAnju T Sudhakar 57*1a3ec143SAnju T SudhakarIMC example usage 58*1a3ec143SAnju T Sudhakar================= 59*1a3ec143SAnju T Sudhakar 60*1a3ec143SAnju T Sudhakar.. code-block:: sh 61*1a3ec143SAnju T Sudhakar 62*1a3ec143SAnju T Sudhakar # perf list 63*1a3ec143SAnju T Sudhakar [...] 64*1a3ec143SAnju T Sudhakar nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event] 65*1a3ec143SAnju T Sudhakar nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event] 66*1a3ec143SAnju T Sudhakar [...] 67*1a3ec143SAnju T Sudhakar core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] 68*1a3ec143SAnju T Sudhakar core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] 69*1a3ec143SAnju T Sudhakar [...] 70*1a3ec143SAnju T Sudhakar thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] 71*1a3ec143SAnju T Sudhakar thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] 72*1a3ec143SAnju T Sudhakar 73*1a3ec143SAnju T SudhakarTo see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/: 74*1a3ec143SAnju T Sudhakar 75*1a3ec143SAnju T Sudhakar.. code-block:: sh 76*1a3ec143SAnju T Sudhakar 77*1a3ec143SAnju T Sudhakar # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket 78*1a3ec143SAnju T Sudhakar 79*1a3ec143SAnju T SudhakarTo see non-idle instructions for core 0: 80*1a3ec143SAnju T Sudhakar 81*1a3ec143SAnju T Sudhakar.. code-block:: sh 82*1a3ec143SAnju T Sudhakar 83*1a3ec143SAnju T Sudhakar # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000 84*1a3ec143SAnju T Sudhakar 85*1a3ec143SAnju T SudhakarTo see non-idle instructions for a "make": 86*1a3ec143SAnju T Sudhakar 87*1a3ec143SAnju T Sudhakar.. code-block:: sh 88*1a3ec143SAnju T Sudhakar 89*1a3ec143SAnju T Sudhakar # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make 90*1a3ec143SAnju T Sudhakar 91*1a3ec143SAnju T Sudhakar 92*1a3ec143SAnju T SudhakarIMC Trace-mode 93*1a3ec143SAnju T Sudhakar=============== 94*1a3ec143SAnju T Sudhakar 95*1a3ec143SAnju T SudhakarPOWER9 supports two modes for IMC which are the Accumulation mode and Trace 96*1a3ec143SAnju T Sudhakarmode. In Accumulation mode, event counts are accumulated in system Memory. 97*1a3ec143SAnju T SudhakarHypervisor then reads the posted counts periodically or when requested. In IMC 98*1a3ec143SAnju T SudhakarTrace mode, the 64 bit trace SCOM value is initialized with the event 99*1a3ec143SAnju T Sudhakarinformation. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event 100*1a3ec143SAnju T Sudhakarto be monitored and the sampling duration. On each overflow in the CPMCxSEL, 101*1a3ec143SAnju T Sudhakarhardware snapshots the program counter along with event counts and writes into 102*1a3ec143SAnju T Sudhakarmemory pointed by LDBAR. 103*1a3ec143SAnju T Sudhakar 104*1a3ec143SAnju T SudhakarLDBAR is a 64 bit special purpose per thread register, it has bits to indicate 105*1a3ec143SAnju T Sudhakarwhether hardware is configured for accumulation or trace mode. 106*1a3ec143SAnju T Sudhakar 107*1a3ec143SAnju T SudhakarLDBAR Register Layout 108*1a3ec143SAnju T Sudhakar--------------------- 109*1a3ec143SAnju T Sudhakar 110*1a3ec143SAnju T Sudhakar +-------+----------------------+ 111*1a3ec143SAnju T Sudhakar | 0 | Enable/Disable | 112*1a3ec143SAnju T Sudhakar +-------+----------------------+ 113*1a3ec143SAnju T Sudhakar | 1 | 0: Accumulation Mode | 114*1a3ec143SAnju T Sudhakar | +----------------------+ 115*1a3ec143SAnju T Sudhakar | | 1: Trace Mode | 116*1a3ec143SAnju T Sudhakar +-------+----------------------+ 117*1a3ec143SAnju T Sudhakar | 2:3 | Reserved | 118*1a3ec143SAnju T Sudhakar +-------+----------------------+ 119*1a3ec143SAnju T Sudhakar | 4-6 | PB scope | 120*1a3ec143SAnju T Sudhakar +-------+----------------------+ 121*1a3ec143SAnju T Sudhakar | 7 | Reserved | 122*1a3ec143SAnju T Sudhakar +-------+----------------------+ 123*1a3ec143SAnju T Sudhakar | 8:50 | Counter Address | 124*1a3ec143SAnju T Sudhakar +-------+----------------------+ 125*1a3ec143SAnju T Sudhakar | 51:63 | Reserved | 126*1a3ec143SAnju T Sudhakar +-------+----------------------+ 127*1a3ec143SAnju T Sudhakar 128*1a3ec143SAnju T SudhakarTRACE_IMC_SCOM bit representation 129*1a3ec143SAnju T Sudhakar--------------------------------- 130*1a3ec143SAnju T Sudhakar 131*1a3ec143SAnju T Sudhakar +-------+------------+ 132*1a3ec143SAnju T Sudhakar | 0:1 | SAMPSEL | 133*1a3ec143SAnju T Sudhakar +-------+------------+ 134*1a3ec143SAnju T Sudhakar | 2:33 | CPMC_LOAD | 135*1a3ec143SAnju T Sudhakar +-------+------------+ 136*1a3ec143SAnju T Sudhakar | 34:40 | CPMC1SEL | 137*1a3ec143SAnju T Sudhakar +-------+------------+ 138*1a3ec143SAnju T Sudhakar | 41:47 | CPMC2SEL | 139*1a3ec143SAnju T Sudhakar +-------+------------+ 140*1a3ec143SAnju T Sudhakar | 48:50 | BUFFERSIZE | 141*1a3ec143SAnju T Sudhakar +-------+------------+ 142*1a3ec143SAnju T Sudhakar | 51:63 | RESERVED | 143*1a3ec143SAnju T Sudhakar +-------+------------+ 144*1a3ec143SAnju T Sudhakar 145*1a3ec143SAnju T SudhakarCPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the 146*1a3ec143SAnju T Sudhakarevent to count. BUFFERSIZE indicates the memory range. On each overflow, 147*1a3ec143SAnju T Sudhakarhardware snapshots the program counter along with event counts and updates the 148*1a3ec143SAnju T Sudhakarmemory and reloads the CMPC_LOAD value for the next sampling duration. IMC 149*1a3ec143SAnju T Sudhakarhardware does not support exceptions, so it quietly wraps around if memory 150*1a3ec143SAnju T Sudhakarbuffer reaches the end. 151*1a3ec143SAnju T Sudhakar 152*1a3ec143SAnju T Sudhakar*Currently the event monitored for trace-mode is fixed as cycle.* 153*1a3ec143SAnju T Sudhakar 154*1a3ec143SAnju T SudhakarTrace IMC example usage 155*1a3ec143SAnju T Sudhakar======================= 156*1a3ec143SAnju T Sudhakar 157*1a3ec143SAnju T Sudhakar.. code-block:: sh 158*1a3ec143SAnju T Sudhakar 159*1a3ec143SAnju T Sudhakar # perf list 160*1a3ec143SAnju T Sudhakar [....] 161*1a3ec143SAnju T Sudhakar trace_imc/trace_cycles/ [Kernel PMU event] 162*1a3ec143SAnju T Sudhakar 163*1a3ec143SAnju T SudhakarTo record an application/process with trace-imc event: 164*1a3ec143SAnju T Sudhakar 165*1a3ec143SAnju T Sudhakar.. code-block:: sh 166*1a3ec143SAnju T Sudhakar 167*1a3ec143SAnju T Sudhakar # perf record -e trace_imc/trace_cycles/ yes > /dev/null 168*1a3ec143SAnju T Sudhakar [ perf record: Woken up 1 times to write data ] 169*1a3ec143SAnju T Sudhakar [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ] 170*1a3ec143SAnju T Sudhakar 171*1a3ec143SAnju T SudhakarThe `perf.data` generated, can be read using perf report. 172*1a3ec143SAnju T Sudhakar 173*1a3ec143SAnju T SudhakarBenefits of using IMC trace-mode 174*1a3ec143SAnju T Sudhakar================================ 175*1a3ec143SAnju T Sudhakar 176*1a3ec143SAnju T SudhakarPMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC 177*1a3ec143SAnju T Sudhakartrace mode snapshots the program counter and updates to the memory. And this 178*1a3ec143SAnju T Sudhakaralso provide a way for the operating system to do instruction sampling in real 179*1a3ec143SAnju T Sudhakartime without PMI processing overhead. 180*1a3ec143SAnju T Sudhakar 181*1a3ec143SAnju T SudhakarPerformance data using `perf top` with and without trace-imc event. 182*1a3ec143SAnju T Sudhakar 183*1a3ec143SAnju T SudhakarPMI interrupts count when `perf top` command is executed without trace-imc event. 184*1a3ec143SAnju T Sudhakar 185*1a3ec143SAnju T Sudhakar.. code-block:: sh 186*1a3ec143SAnju T Sudhakar 187*1a3ec143SAnju T Sudhakar # grep PMI /proc/interrupts 188*1a3ec143SAnju T Sudhakar PMI: 0 0 0 0 Performance monitoring interrupts 189*1a3ec143SAnju T Sudhakar # ./perf top 190*1a3ec143SAnju T Sudhakar ... 191*1a3ec143SAnju T Sudhakar # grep PMI /proc/interrupts 192*1a3ec143SAnju T Sudhakar PMI: 39735 8710 17338 17801 Performance monitoring interrupts 193*1a3ec143SAnju T Sudhakar # ./perf top -e trace_imc/trace_cycles/ 194*1a3ec143SAnju T Sudhakar ... 195*1a3ec143SAnju T Sudhakar # grep PMI /proc/interrupts 196*1a3ec143SAnju T Sudhakar PMI: 39735 8710 17338 17801 Performance monitoring interrupts 197*1a3ec143SAnju T Sudhakar 198*1a3ec143SAnju T Sudhakar 199*1a3ec143SAnju T SudhakarThat is, the PMI interrupt counts do not increment when using the `trace_imc` event. 200