xref: /openbmc/linux/Documentation/powerpc/imc.rst (revision 8dd06ef34b6e2f41b29fbf5fc1663780f2524285)
1*1a3ec143SAnju T Sudhakar.. SPDX-License-Identifier: GPL-2.0
2*1a3ec143SAnju T Sudhakar.. _imc:
3*1a3ec143SAnju T Sudhakar
4*1a3ec143SAnju T Sudhakar===================================
5*1a3ec143SAnju T SudhakarIMC (In-Memory Collection Counters)
6*1a3ec143SAnju T Sudhakar===================================
7*1a3ec143SAnju T Sudhakar
8*1a3ec143SAnju T SudhakarAnju T Sudhakar, 10 May 2019
9*1a3ec143SAnju T Sudhakar
10*1a3ec143SAnju T Sudhakar.. contents::
11*1a3ec143SAnju T Sudhakar    :depth: 3
12*1a3ec143SAnju T Sudhakar
13*1a3ec143SAnju T Sudhakar
14*1a3ec143SAnju T SudhakarBasic overview
15*1a3ec143SAnju T Sudhakar==============
16*1a3ec143SAnju T Sudhakar
17*1a3ec143SAnju T SudhakarIMC (In-Memory collection counters) is a hardware monitoring facility that
18*1a3ec143SAnju T Sudhakarcollects large numbers of hardware performance events at Nest level (these are
19*1a3ec143SAnju T Sudhakaron-chip but off-core), Core level and Thread level.
20*1a3ec143SAnju T Sudhakar
21*1a3ec143SAnju T SudhakarThe Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC
22*1a3ec143SAnju T Sudhakar(On-Chip Controller) complex. The microcode collects the counter data and moves
23*1a3ec143SAnju T Sudhakarthe nest IMC counter data to memory.
24*1a3ec143SAnju T Sudhakar
25*1a3ec143SAnju T SudhakarThe Core and Thread IMC PMU counters are handled in the core. Core level PMU
26*1a3ec143SAnju T Sudhakarcounters give us the IMC counters' data per core and thread level PMU counters
27*1a3ec143SAnju T Sudhakargive us the IMC counters' data per CPU thread.
28*1a3ec143SAnju T Sudhakar
29*1a3ec143SAnju T SudhakarOPAL obtains the IMC PMU and supported events information from the IMC Catalog
30*1a3ec143SAnju T Sudhakarand passes on to the kernel via the device tree. The event's information
31*1a3ec143SAnju T Sudhakarcontains:
32*1a3ec143SAnju T Sudhakar
33*1a3ec143SAnju T Sudhakar- Event name
34*1a3ec143SAnju T Sudhakar- Event Offset
35*1a3ec143SAnju T Sudhakar- Event description
36*1a3ec143SAnju T Sudhakar
37*1a3ec143SAnju T Sudhakarand possibly also:
38*1a3ec143SAnju T Sudhakar
39*1a3ec143SAnju T Sudhakar- Event scale
40*1a3ec143SAnju T Sudhakar- Event unit
41*1a3ec143SAnju T Sudhakar
42*1a3ec143SAnju T SudhakarSome PMUs may have a common scale and unit values for all their supported
43*1a3ec143SAnju T Sudhakarevents. For those cases, the scale and unit properties for those events must be
44*1a3ec143SAnju T Sudhakarinherited from the PMU.
45*1a3ec143SAnju T Sudhakar
46*1a3ec143SAnju T SudhakarThe event offset in the memory is where the counter data gets accumulated.
47*1a3ec143SAnju T Sudhakar
48*1a3ec143SAnju T SudhakarIMC catalog is available at:
49*1a3ec143SAnju T Sudhakar	https://github.com/open-power/ima-catalog
50*1a3ec143SAnju T Sudhakar
51*1a3ec143SAnju T SudhakarThe kernel discovers the IMC counters information in the device tree at the
52*1a3ec143SAnju T Sudhakar`imc-counters` device node which has a compatible field
53*1a3ec143SAnju T Sudhakar`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs
54*1a3ec143SAnju T Sudhakarand their event's information and register the PMU and its attributes in the
55*1a3ec143SAnju T Sudhakarkernel.
56*1a3ec143SAnju T Sudhakar
57*1a3ec143SAnju T SudhakarIMC example usage
58*1a3ec143SAnju T Sudhakar=================
59*1a3ec143SAnju T Sudhakar
60*1a3ec143SAnju T Sudhakar.. code-block:: sh
61*1a3ec143SAnju T Sudhakar
62*1a3ec143SAnju T Sudhakar  # perf list
63*1a3ec143SAnju T Sudhakar  [...]
64*1a3ec143SAnju T Sudhakar  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/            [Kernel PMU event]
65*1a3ec143SAnju T Sudhakar  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/            [Kernel PMU event]
66*1a3ec143SAnju T Sudhakar  [...]
67*1a3ec143SAnju T Sudhakar  core_imc/CPM_0THRD_NON_IDLE_PCYC/                  [Kernel PMU event]
68*1a3ec143SAnju T Sudhakar  core_imc/CPM_1THRD_NON_IDLE_INST/                  [Kernel PMU event]
69*1a3ec143SAnju T Sudhakar  [...]
70*1a3ec143SAnju T Sudhakar  thread_imc/CPM_0THRD_NON_IDLE_PCYC/                [Kernel PMU event]
71*1a3ec143SAnju T Sudhakar  thread_imc/CPM_1THRD_NON_IDLE_INST/                [Kernel PMU event]
72*1a3ec143SAnju T Sudhakar
73*1a3ec143SAnju T SudhakarTo see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/:
74*1a3ec143SAnju T Sudhakar
75*1a3ec143SAnju T Sudhakar.. code-block:: sh
76*1a3ec143SAnju T Sudhakar
77*1a3ec143SAnju T Sudhakar  # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket
78*1a3ec143SAnju T Sudhakar
79*1a3ec143SAnju T SudhakarTo see non-idle instructions for core 0:
80*1a3ec143SAnju T Sudhakar
81*1a3ec143SAnju T Sudhakar.. code-block:: sh
82*1a3ec143SAnju T Sudhakar
83*1a3ec143SAnju T Sudhakar  # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000
84*1a3ec143SAnju T Sudhakar
85*1a3ec143SAnju T SudhakarTo see non-idle instructions for a "make":
86*1a3ec143SAnju T Sudhakar
87*1a3ec143SAnju T Sudhakar.. code-block:: sh
88*1a3ec143SAnju T Sudhakar
89*1a3ec143SAnju T Sudhakar  # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make
90*1a3ec143SAnju T Sudhakar
91*1a3ec143SAnju T Sudhakar
92*1a3ec143SAnju T SudhakarIMC Trace-mode
93*1a3ec143SAnju T Sudhakar===============
94*1a3ec143SAnju T Sudhakar
95*1a3ec143SAnju T SudhakarPOWER9 supports two modes for IMC which are the Accumulation mode and Trace
96*1a3ec143SAnju T Sudhakarmode. In Accumulation mode, event counts are accumulated in system Memory.
97*1a3ec143SAnju T SudhakarHypervisor then reads the posted counts periodically or when requested. In IMC
98*1a3ec143SAnju T SudhakarTrace mode, the 64 bit trace SCOM value is initialized with the event
99*1a3ec143SAnju T Sudhakarinformation. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event
100*1a3ec143SAnju T Sudhakarto be monitored and the sampling duration. On each overflow in the CPMCxSEL,
101*1a3ec143SAnju T Sudhakarhardware snapshots the program counter along with event counts and writes into
102*1a3ec143SAnju T Sudhakarmemory pointed by LDBAR.
103*1a3ec143SAnju T Sudhakar
104*1a3ec143SAnju T SudhakarLDBAR is a 64 bit special purpose per thread register, it has bits to indicate
105*1a3ec143SAnju T Sudhakarwhether hardware is configured for accumulation or trace mode.
106*1a3ec143SAnju T Sudhakar
107*1a3ec143SAnju T SudhakarLDBAR Register Layout
108*1a3ec143SAnju T Sudhakar---------------------
109*1a3ec143SAnju T Sudhakar
110*1a3ec143SAnju T Sudhakar  +-------+----------------------+
111*1a3ec143SAnju T Sudhakar  | 0     | Enable/Disable       |
112*1a3ec143SAnju T Sudhakar  +-------+----------------------+
113*1a3ec143SAnju T Sudhakar  | 1     | 0: Accumulation Mode |
114*1a3ec143SAnju T Sudhakar  |       +----------------------+
115*1a3ec143SAnju T Sudhakar  |       | 1: Trace Mode        |
116*1a3ec143SAnju T Sudhakar  +-------+----------------------+
117*1a3ec143SAnju T Sudhakar  | 2:3   | Reserved             |
118*1a3ec143SAnju T Sudhakar  +-------+----------------------+
119*1a3ec143SAnju T Sudhakar  | 4-6   | PB scope             |
120*1a3ec143SAnju T Sudhakar  +-------+----------------------+
121*1a3ec143SAnju T Sudhakar  | 7     | Reserved             |
122*1a3ec143SAnju T Sudhakar  +-------+----------------------+
123*1a3ec143SAnju T Sudhakar  | 8:50  | Counter Address      |
124*1a3ec143SAnju T Sudhakar  +-------+----------------------+
125*1a3ec143SAnju T Sudhakar  | 51:63 | Reserved             |
126*1a3ec143SAnju T Sudhakar  +-------+----------------------+
127*1a3ec143SAnju T Sudhakar
128*1a3ec143SAnju T SudhakarTRACE_IMC_SCOM bit representation
129*1a3ec143SAnju T Sudhakar---------------------------------
130*1a3ec143SAnju T Sudhakar
131*1a3ec143SAnju T Sudhakar  +-------+------------+
132*1a3ec143SAnju T Sudhakar  | 0:1   | SAMPSEL    |
133*1a3ec143SAnju T Sudhakar  +-------+------------+
134*1a3ec143SAnju T Sudhakar  | 2:33  | CPMC_LOAD  |
135*1a3ec143SAnju T Sudhakar  +-------+------------+
136*1a3ec143SAnju T Sudhakar  | 34:40 | CPMC1SEL   |
137*1a3ec143SAnju T Sudhakar  +-------+------------+
138*1a3ec143SAnju T Sudhakar  | 41:47 | CPMC2SEL   |
139*1a3ec143SAnju T Sudhakar  +-------+------------+
140*1a3ec143SAnju T Sudhakar  | 48:50 | BUFFERSIZE |
141*1a3ec143SAnju T Sudhakar  +-------+------------+
142*1a3ec143SAnju T Sudhakar  | 51:63 | RESERVED   |
143*1a3ec143SAnju T Sudhakar  +-------+------------+
144*1a3ec143SAnju T Sudhakar
145*1a3ec143SAnju T SudhakarCPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the
146*1a3ec143SAnju T Sudhakarevent to count. BUFFERSIZE indicates the memory range. On each overflow,
147*1a3ec143SAnju T Sudhakarhardware snapshots the program counter along with event counts and updates the
148*1a3ec143SAnju T Sudhakarmemory and reloads the CMPC_LOAD value for the next sampling duration. IMC
149*1a3ec143SAnju T Sudhakarhardware does not support exceptions, so it quietly wraps around if memory
150*1a3ec143SAnju T Sudhakarbuffer reaches the end.
151*1a3ec143SAnju T Sudhakar
152*1a3ec143SAnju T Sudhakar*Currently the event monitored for trace-mode is fixed as cycle.*
153*1a3ec143SAnju T Sudhakar
154*1a3ec143SAnju T SudhakarTrace IMC example usage
155*1a3ec143SAnju T Sudhakar=======================
156*1a3ec143SAnju T Sudhakar
157*1a3ec143SAnju T Sudhakar.. code-block:: sh
158*1a3ec143SAnju T Sudhakar
159*1a3ec143SAnju T Sudhakar  # perf list
160*1a3ec143SAnju T Sudhakar  [....]
161*1a3ec143SAnju T Sudhakar  trace_imc/trace_cycles/                            [Kernel PMU event]
162*1a3ec143SAnju T Sudhakar
163*1a3ec143SAnju T SudhakarTo record an application/process with trace-imc event:
164*1a3ec143SAnju T Sudhakar
165*1a3ec143SAnju T Sudhakar.. code-block:: sh
166*1a3ec143SAnju T Sudhakar
167*1a3ec143SAnju T Sudhakar  # perf record -e trace_imc/trace_cycles/ yes > /dev/null
168*1a3ec143SAnju T Sudhakar  [ perf record: Woken up 1 times to write data ]
169*1a3ec143SAnju T Sudhakar  [ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ]
170*1a3ec143SAnju T Sudhakar
171*1a3ec143SAnju T SudhakarThe `perf.data` generated, can be read using perf report.
172*1a3ec143SAnju T Sudhakar
173*1a3ec143SAnju T SudhakarBenefits of using IMC trace-mode
174*1a3ec143SAnju T Sudhakar================================
175*1a3ec143SAnju T Sudhakar
176*1a3ec143SAnju T SudhakarPMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC
177*1a3ec143SAnju T Sudhakartrace mode snapshots the program counter and updates to the memory. And this
178*1a3ec143SAnju T Sudhakaralso provide a way for the operating system to do instruction sampling in real
179*1a3ec143SAnju T Sudhakartime without PMI processing overhead.
180*1a3ec143SAnju T Sudhakar
181*1a3ec143SAnju T SudhakarPerformance data using `perf top` with and without trace-imc event.
182*1a3ec143SAnju T Sudhakar
183*1a3ec143SAnju T SudhakarPMI interrupts count when `perf top` command is executed without trace-imc event.
184*1a3ec143SAnju T Sudhakar
185*1a3ec143SAnju T Sudhakar.. code-block:: sh
186*1a3ec143SAnju T Sudhakar
187*1a3ec143SAnju T Sudhakar  # grep PMI /proc/interrupts
188*1a3ec143SAnju T Sudhakar  PMI:          0          0          0          0   Performance monitoring interrupts
189*1a3ec143SAnju T Sudhakar  # ./perf top
190*1a3ec143SAnju T Sudhakar  ...
191*1a3ec143SAnju T Sudhakar  # grep PMI /proc/interrupts
192*1a3ec143SAnju T Sudhakar  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
193*1a3ec143SAnju T Sudhakar  # ./perf top -e trace_imc/trace_cycles/
194*1a3ec143SAnju T Sudhakar  ...
195*1a3ec143SAnju T Sudhakar  # grep PMI /proc/interrupts
196*1a3ec143SAnju T Sudhakar  PMI:      39735       8710      17338      17801   Performance monitoring interrupts
197*1a3ec143SAnju T Sudhakar
198*1a3ec143SAnju T Sudhakar
199*1a3ec143SAnju T SudhakarThat is, the PMI interrupt counts do not increment when using the `trace_imc` event.
200