xref: /openbmc/linux/Documentation/trace/hisi-ptt.rst (revision 2612e3bbc0386368a850140a6c9b990cd496a5ec)
1a7112b74SYicong Yang.. SPDX-License-Identifier: GPL-2.0
2a7112b74SYicong Yang
3a7112b74SYicong Yang======================================
4a7112b74SYicong YangHiSilicon PCIe Tune and Trace device
5a7112b74SYicong Yang======================================
6a7112b74SYicong Yang
7a7112b74SYicong YangIntroduction
8a7112b74SYicong Yang============
9a7112b74SYicong Yang
10a7112b74SYicong YangHiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
11a7112b74SYicong Yangintegrated Endpoint (RCiEP) device, providing the capability
12a7112b74SYicong Yangto dynamically monitor and tune the PCIe link's events (tune),
13a7112b74SYicong Yangand trace the TLP headers (trace). The two functions are independent,
14a7112b74SYicong Yangbut is recommended to use them together to analyze and enhance the
15a7112b74SYicong YangPCIe link's performance.
16a7112b74SYicong Yang
17a7112b74SYicong YangOn Kunpeng 930 SoC, the PCIe Root Complex is composed of several
18a7112b74SYicong YangPCIe cores. Each PCIe core includes several Root Ports and a PTT
19a7112b74SYicong YangRCiEP, like below. The PTT device is capable of tuning and
20a7112b74SYicong Yangtracing the links of the PCIe core.
21a7112b74SYicong Yang::
22a7112b74SYicong Yang
23a7112b74SYicong Yang          +--------------Core 0-------+
24a7112b74SYicong Yang          |       |       [   PTT   ] |
25a7112b74SYicong Yang          |       |       [Root Port]---[Endpoint]
26a7112b74SYicong Yang          |       |       [Root Port]---[Endpoint]
27a7112b74SYicong Yang          |       |       [Root Port]---[Endpoint]
28a7112b74SYicong Yang    Root Complex  |------Core 1-------+
29a7112b74SYicong Yang          |       |       [   PTT   ] |
30a7112b74SYicong Yang          |       |       [Root Port]---[ Switch ]---[Endpoint]
31a7112b74SYicong Yang          |       |       [Root Port]---[Endpoint] `-[Endpoint]
32a7112b74SYicong Yang          |       |       [Root Port]---[Endpoint]
33a7112b74SYicong Yang          +---------------------------+
34a7112b74SYicong Yang
35a7112b74SYicong YangThe PTT device driver registers one PMU device for each PTT device.
36a7112b74SYicong YangThe name of each PTT device is composed of 'hisi_ptt' prefix with
37a7112b74SYicong Yangthe id of the SICL and the Core where it locates. The Kunpeng 930
38a7112b74SYicong YangSoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and
39a7112b74SYicong YangIO dies (SICL, Super I/O Cluster), where there's one PCIe Root
40a7112b74SYicong YangComplex for each SICL.
41a7112b74SYicong Yang::
42a7112b74SYicong Yang
43a7112b74SYicong Yang    /sys/devices/hisi_ptt<sicl_id>_<core_id>
44a7112b74SYicong Yang
45a7112b74SYicong YangTune
46a7112b74SYicong Yang====
47a7112b74SYicong Yang
48a7112b74SYicong YangPTT tune is designed for monitoring and adjusting PCIe link parameters (events).
49a7112b74SYicong YangCurrently we support events in 2 classes. The scope of the events
50a7112b74SYicong Yangcovers the PCIe core to which the PTT device belongs.
51a7112b74SYicong Yang
52a7112b74SYicong YangEach event is presented as a file under $(PTT PMU dir)/tune, and
53a7112b74SYicong Yanga simple open/read/write/close cycle will be used to tune the event.
54a7112b74SYicong Yang::
55a7112b74SYicong Yang
56a7112b74SYicong Yang    $ cd /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune
57a7112b74SYicong Yang    $ ls
58a7112b74SYicong Yang    qos_tx_cpl    qos_tx_np    qos_tx_p
59a7112b74SYicong Yang    tx_path_rx_req_alloc_buf_level
60a7112b74SYicong Yang    tx_path_tx_req_alloc_buf_level
61a7112b74SYicong Yang    $ cat qos_tx_dp
62a7112b74SYicong Yang    1
63a7112b74SYicong Yang    $ echo 2 > qos_tx_dp
64a7112b74SYicong Yang    $ cat qos_tx_dp
65a7112b74SYicong Yang    2
66a7112b74SYicong Yang
67a7112b74SYicong YangCurrent value (numerical value) of the event can be simply read
68a7112b74SYicong Yangfrom the file, and the desired value written to the file to tune.
69a7112b74SYicong Yang
70a7112b74SYicong Yang1. Tx Path QoS Control
71a7112b74SYicong Yang------------------------
72a7112b74SYicong Yang
73a7112b74SYicong YangThe following files are provided to tune the QoS of the tx path of
74a7112b74SYicong Yangthe PCIe core.
75a7112b74SYicong Yang
76a7112b74SYicong Yang- qos_tx_cpl: weight of Tx completion TLPs
77a7112b74SYicong Yang- qos_tx_np: weight of Tx non-posted TLPs
78a7112b74SYicong Yang- qos_tx_p: weight of Tx posted TLPs
79a7112b74SYicong Yang
80a7112b74SYicong YangThe weight influences the proportion of certain packets on the PCIe link.
81a7112b74SYicong YangFor example, for the storage scenario, increase the proportion
82a7112b74SYicong Yangof the completion packets on the link to enhance the performance as
83a7112b74SYicong Yangmore completions are consumed.
84a7112b74SYicong Yang
85a7112b74SYicong YangThe available tune data of these events is [0, 1, 2].
86a7112b74SYicong YangWriting a negative value will return an error, and out of range
87a7112b74SYicong Yangvalues will be converted to 2. Note that the event value just
88a7112b74SYicong Yangindicates a probable level, but is not precise.
89a7112b74SYicong Yang
90a7112b74SYicong Yang2. Tx Path Buffer Control
91a7112b74SYicong Yang-------------------------
92a7112b74SYicong Yang
93a7112b74SYicong YangFollowing files are provided to tune the buffer of tx path of the PCIe core.
94a7112b74SYicong Yang
95a7112b74SYicong Yang- rx_alloc_buf_level: watermark of Rx requested
96a7112b74SYicong Yang- tx_alloc_buf_level: watermark of Tx requested
97a7112b74SYicong Yang
98a7112b74SYicong YangThese events influence the watermark of the buffer allocated for each
99a7112b74SYicong Yangtype. Rx means the inbound while Tx means outbound. The packets will
100a7112b74SYicong Yangbe stored in the buffer first and then transmitted either when the
101a7112b74SYicong Yangwatermark reached or when timed out. For a busy direction, you should
102a7112b74SYicong Yangincrease the related buffer watermark to avoid frequently posting and
103a7112b74SYicong Yangthus enhance the performance. In most cases just keep the default value.
104a7112b74SYicong Yang
105a7112b74SYicong YangThe available tune data of above events is [0, 1, 2].
106a7112b74SYicong YangWriting a negative value will return an error, and out of range
107a7112b74SYicong Yangvalues will be converted to 2. Note that the event value just
108a7112b74SYicong Yangindicates a probable level, but is not precise.
109a7112b74SYicong Yang
110a7112b74SYicong YangTrace
111a7112b74SYicong Yang=====
112a7112b74SYicong Yang
113a7112b74SYicong YangPTT trace is designed for dumping the TLP headers to the memory, which
114a7112b74SYicong Yangcan be used to analyze the transactions and usage condition of the PCIe
115a7112b74SYicong YangLink. You can choose to filter the traced headers by either Requester ID,
116a7112b74SYicong Yangor those downstream of a set of Root Ports on the same core of the PTT
117a7112b74SYicong Yangdevice. It's also supported to trace the headers of certain type and of
118a7112b74SYicong Yangcertain direction.
119a7112b74SYicong Yang
120a7112b74SYicong YangYou can use the perf command `perf record` to set the parameters, start
121a7112b74SYicong Yangtrace and get the data. It's also supported to decode the trace
122a7112b74SYicong Yangdata with `perf report`. The control parameters for trace is inputted
123a7112b74SYicong Yangas event code for each events, which will be further illustrated later.
124a7112b74SYicong YangAn example usage is like
125a7112b74SYicong Yang::
126a7112b74SYicong Yang
127a7112b74SYicong Yang    $ perf record -e hisi_ptt0_2/filter=0x80001,type=1,direction=1,
128a7112b74SYicong Yang      format=1/ -- sleep 5
129a7112b74SYicong Yang
130a7112b74SYicong YangThis will trace the TLP headers downstream root port 0000:00:10.1 (event
131a7112b74SYicong Yangcode for event 'filter' is 0x80001) with type of posted TLP requests,
132a7112b74SYicong Yangdirection of inbound and traced data format of 8DW.
133a7112b74SYicong Yang
134a7112b74SYicong Yang1. Filter
135a7112b74SYicong Yang---------
136a7112b74SYicong Yang
137a7112b74SYicong YangThe TLP headers to trace can be filtered by the Root Ports or the Requester ID
138a7112b74SYicong Yangof the Endpoint, which are located on the same core of the PTT device. You can
139a7112b74SYicong Yangset the filter by specifying the `filter` parameter which is required to start
140a7112b74SYicong Yangthe trace. The parameter value is 20 bit. Bit 19 indicates the filter type.
141a7112b74SYicong Yang1 for Root Port filter and 0 for Requester filter. Bit[15:0] indicates the
142a7112b74SYicong Yangfilter value. The value for a Root Port is a mask of the core port id which is
143a7112b74SYicong Yangcalculated from its PCI Slot ID as (slotid & 7) * 2. The value for a Requester
144a7112b74SYicong Yangis the Requester ID (Device ID of the PCIe function). Bit[18:16] is currently
145a7112b74SYicong Yangreserved for extension.
146a7112b74SYicong Yang
147a7112b74SYicong YangFor example, if the desired filter is Endpoint function 0000:01:00.1 the filter
148a7112b74SYicong Yangvalue will be 0x00101. If the desired filter is Root Port 0000:00:10.0 then
149a7112b74SYicong Yangthen filter value is calculated as 0x80001.
150a7112b74SYicong Yang
151*6373c463SYicong YangThe driver also presents every supported Root Port and Requester filter through
152*6373c463SYicong Yangsysfs. Each filter will be an individual file with name of its related PCIe
153*6373c463SYicong Yangdevice name (domain:bus:device.function). The files of Root Port filters are
154*6373c463SYicong Yangunder $(PTT PMU dir)/root_port_filters and files of Requester filters
155*6373c463SYicong Yangare under $(PTT PMU dir)/requester_filters.
156*6373c463SYicong Yang
157a7112b74SYicong YangNote that multiple Root Ports can be specified at one time, but only one
158a7112b74SYicong YangEndpoint function can be specified in one trace. Specifying both Root Port
159a7112b74SYicong Yangand function at the same time is not supported. Driver maintains a list of
160a7112b74SYicong Yangavailable filters and will check the invalid inputs.
161a7112b74SYicong Yang
162556ef093SYicong YangThe available filters will be dynamically updated, which means you will always
163556ef093SYicong Yangget correct filter information when hotplug events happen, or when you manually
164556ef093SYicong Yangremove/rescan the devices.
165a7112b74SYicong Yang
166a7112b74SYicong Yang2. Type
167a7112b74SYicong Yang-------
168a7112b74SYicong Yang
169a7112b74SYicong YangYou can trace the TLP headers of certain types by specifying the `type`
170a7112b74SYicong Yangparameter, which is required to start the trace. The parameter value is
171a7112b74SYicong Yang8 bit. Current supported types and related values are shown below:
172a7112b74SYicong Yang
173a7112b74SYicong Yang- 8'b00000001: posted requests (P)
174a7112b74SYicong Yang- 8'b00000010: non-posted requests (NP)
175a7112b74SYicong Yang- 8'b00000100: completions (CPL)
176a7112b74SYicong Yang
177a7112b74SYicong YangYou can specify multiple types when tracing inbound TLP headers, but can only
178a7112b74SYicong Yangspecify one when tracing outbound TLP headers.
179a7112b74SYicong Yang
180a7112b74SYicong Yang3. Direction
181a7112b74SYicong Yang------------
182a7112b74SYicong Yang
183a7112b74SYicong YangYou can trace the TLP headers from certain direction, which is relative
184a7112b74SYicong Yangto the Root Port or the PCIe core, by specifying the `direction` parameter.
185a7112b74SYicong YangThis is optional and the default parameter is inbound. The parameter value
186a7112b74SYicong Yangis 4 bit. When the desired format is 4DW, directions and related values
187a7112b74SYicong Yangsupported are shown below:
188a7112b74SYicong Yang
189a7112b74SYicong Yang- 4'b0000: inbound TLPs (P, NP, CPL)
190a7112b74SYicong Yang- 4'b0001: outbound TLPs (P, NP, CPL)
191a7112b74SYicong Yang- 4'b0010: outbound TLPs (P, NP, CPL) and inbound TLPs (P, NP, CPL B)
192a7112b74SYicong Yang- 4'b0011: outbound TLPs (P, NP, CPL) and inbound TLPs (CPL A)
193a7112b74SYicong Yang
194a7112b74SYicong YangWhen the desired format is 8DW, directions and related values supported are
195a7112b74SYicong Yangshown below:
196a7112b74SYicong Yang
197a7112b74SYicong Yang- 4'b0000: reserved
198a7112b74SYicong Yang- 4'b0001: outbound TLPs (P, NP, CPL)
199a7112b74SYicong Yang- 4'b0010: inbound TLPs (P, NP, CPL B)
200a7112b74SYicong Yang- 4'b0011: inbound TLPs (CPL A)
201a7112b74SYicong Yang
202a7112b74SYicong YangInbound completions are classified into two types:
203a7112b74SYicong Yang
204a7112b74SYicong Yang- completion A (CPL A): completion of CHI/DMA/Native non-posted requests, except for CPL B
205a7112b74SYicong Yang- completion B (CPL B): completion of DMA remote2local and P2P non-posted requests
206a7112b74SYicong Yang
207a7112b74SYicong Yang4. Format
208a7112b74SYicong Yang--------------
209a7112b74SYicong Yang
210a7112b74SYicong YangYou can change the format of the traced TLP headers by specifying the
211a7112b74SYicong Yang`format` parameter. The default format is 4DW. The parameter value is 4 bit.
212a7112b74SYicong YangCurrent supported formats and related values are shown below:
213a7112b74SYicong Yang
214a7112b74SYicong Yang- 4'b0000: 4DW length per TLP header
215a7112b74SYicong Yang- 4'b0001: 8DW length per TLP header
216a7112b74SYicong Yang
217a7112b74SYicong YangThe traced TLP header format is different from the PCIe standard.
218a7112b74SYicong Yang
219a7112b74SYicong YangWhen using the 8DW data format, the entire TLP header is logged
220a7112b74SYicong Yang(Header DW0-3 shown below). For example, the TLP header for Memory
221a7112b74SYicong YangReads with 64-bit addresses is shown in PCIe r5.0, Figure 2-17;
222a7112b74SYicong Yangthe header for Configuration Requests is shown in Figure 2.20, etc.
223a7112b74SYicong Yang
224a7112b74SYicong YangIn addition, 8DW trace buffer entries contain a timestamp and
225a7112b74SYicong Yangpossibly a prefix for a PASID TLP prefix (see Figure 6-20, PCIe r5.0).
226a7112b74SYicong YangOtherwise this field will be all 0.
227a7112b74SYicong Yang
228a7112b74SYicong YangThe bit[31:11] of DW0 is always 0x1fffff, which can be
229a7112b74SYicong Yangused to distinguish the data format. 8DW format is like
230a7112b74SYicong Yang::
231a7112b74SYicong Yang
232a7112b74SYicong Yang    bits [                 31:11                 ][       10:0       ]
233a7112b74SYicong Yang         |---------------------------------------|-------------------|
234a7112b74SYicong Yang     DW0 [                0x1fffff               ][ Reserved (0x7ff) ]
235a7112b74SYicong Yang     DW1 [                       Prefix                              ]
236a7112b74SYicong Yang     DW2 [                     Header DW0                            ]
237a7112b74SYicong Yang     DW3 [                     Header DW1                            ]
238a7112b74SYicong Yang     DW4 [                     Header DW2                            ]
239a7112b74SYicong Yang     DW5 [                     Header DW3                            ]
240a7112b74SYicong Yang     DW6 [                   Reserved (0x0)                          ]
241a7112b74SYicong Yang     DW7 [                        Time                               ]
242a7112b74SYicong Yang
243a7112b74SYicong YangWhen using the 4DW data format, DW0 of the trace buffer entry
244a7112b74SYicong Yangcontains selected fields of DW0 of the TLP, together with a
245a7112b74SYicong Yangtimestamp.  DW1-DW3 of the trace buffer entry contain DW1-DW3
246a7112b74SYicong Yangdirectly from the TLP header.
247a7112b74SYicong Yang
248a7112b74SYicong Yang4DW format is like
249a7112b74SYicong Yang::
250a7112b74SYicong Yang
251a7112b74SYicong Yang    bits [31:30] [ 29:25 ][24][23][22][21][    20:11   ][    10:0    ]
252a7112b74SYicong Yang         |-----|---------|---|---|---|---|-------------|-------------|
253a7112b74SYicong Yang     DW0 [ Fmt ][  Type  ][T9][T8][TH][SO][   Length   ][    Time    ]
254a7112b74SYicong Yang     DW1 [                     Header DW1                            ]
255a7112b74SYicong Yang     DW2 [                     Header DW2                            ]
256a7112b74SYicong Yang     DW3 [                     Header DW3                            ]
257a7112b74SYicong Yang
258a7112b74SYicong Yang5. Memory Management
259a7112b74SYicong Yang--------------------
260a7112b74SYicong Yang
261a7112b74SYicong YangThe traced TLP headers will be written to the memory allocated
262a7112b74SYicong Yangby the driver. The hardware accepts 4 DMA address with same size,
263a7112b74SYicong Yangand writes the buffer sequentially like below. If DMA addr 3 is
264a7112b74SYicong Yangfinished and the trace is still on, it will return to addr 0.
265a7112b74SYicong Yang::
266a7112b74SYicong Yang
267a7112b74SYicong Yang    +->[DMA addr 0]->[DMA addr 1]->[DMA addr 2]->[DMA addr 3]-+
268a7112b74SYicong Yang    +---------------------------------------------------------+
269a7112b74SYicong Yang
270a7112b74SYicong YangDriver will allocate each DMA buffer of 4MiB. The finished buffer
271a7112b74SYicong Yangwill be copied to the perf AUX buffer allocated by the perf core.
272a7112b74SYicong YangOnce the AUX buffer is full while the trace is still on, driver
273a7112b74SYicong Yangwill commit the AUX buffer first and then apply for a new one with
274a7112b74SYicong Yangthe same size. The size of AUX buffer is default to 16MiB. User can
275a7112b74SYicong Yangadjust the size by specifying the `-m` parameter of the perf command.
276a7112b74SYicong Yang
277a7112b74SYicong Yang6. Decoding
278a7112b74SYicong Yang-----------
279a7112b74SYicong Yang
280a7112b74SYicong YangYou can decode the traced data with `perf report -D` command (currently
281a7112b74SYicong Yangonly support to dump the raw trace data). The traced data will be decoded
282a7112b74SYicong Yangaccording to the format described previously (take 8DW as an example):
283a7112b74SYicong Yang::
284a7112b74SYicong Yang
285a7112b74SYicong Yang    [...perf headers and other information]
286a7112b74SYicong Yang    . ... HISI PTT data: size 4194304 bytes
287a7112b74SYicong Yang    .  00000000: 00 00 00 00                                 Prefix
288a7112b74SYicong Yang    .  00000004: 01 00 00 60                                 Header DW0
289a7112b74SYicong Yang    .  00000008: 0f 1e 00 01                                 Header DW1
290a7112b74SYicong Yang    .  0000000c: 04 00 00 00                                 Header DW2
291a7112b74SYicong Yang    .  00000010: 40 00 81 02                                 Header DW3
292a7112b74SYicong Yang    .  00000014: 33 c0 04 00                                 Time
293a7112b74SYicong Yang    .  00000020: 00 00 00 00                                 Prefix
294a7112b74SYicong Yang    .  00000024: 01 00 00 60                                 Header DW0
295a7112b74SYicong Yang    .  00000028: 0f 1e 00 01                                 Header DW1
296a7112b74SYicong Yang    .  0000002c: 04 00 00 00                                 Header DW2
297a7112b74SYicong Yang    .  00000030: 40 00 81 02                                 Header DW3
298a7112b74SYicong Yang    .  00000034: 02 00 00 00                                 Time
299a7112b74SYicong Yang    .  00000040: 00 00 00 00                                 Prefix
300a7112b74SYicong Yang    .  00000044: 01 00 00 60                                 Header DW0
301a7112b74SYicong Yang    .  00000048: 0f 1e 00 01                                 Header DW1
302a7112b74SYicong Yang    .  0000004c: 04 00 00 00                                 Header DW2
303a7112b74SYicong Yang    .  00000050: 40 00 81 02                                 Header DW3
304a7112b74SYicong Yang    [...]
305