1a7112b74SYicong Yang.. SPDX-License-Identifier: GPL-2.0 2a7112b74SYicong Yang 3a7112b74SYicong Yang====================================== 4a7112b74SYicong YangHiSilicon PCIe Tune and Trace device 5a7112b74SYicong Yang====================================== 6a7112b74SYicong Yang 7a7112b74SYicong YangIntroduction 8a7112b74SYicong Yang============ 9a7112b74SYicong Yang 10a7112b74SYicong YangHiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex 11a7112b74SYicong Yangintegrated Endpoint (RCiEP) device, providing the capability 12a7112b74SYicong Yangto dynamically monitor and tune the PCIe link's events (tune), 13a7112b74SYicong Yangand trace the TLP headers (trace). The two functions are independent, 14a7112b74SYicong Yangbut is recommended to use them together to analyze and enhance the 15a7112b74SYicong YangPCIe link's performance. 16a7112b74SYicong Yang 17a7112b74SYicong YangOn Kunpeng 930 SoC, the PCIe Root Complex is composed of several 18a7112b74SYicong YangPCIe cores. Each PCIe core includes several Root Ports and a PTT 19a7112b74SYicong YangRCiEP, like below. The PTT device is capable of tuning and 20a7112b74SYicong Yangtracing the links of the PCIe core. 21a7112b74SYicong Yang:: 22a7112b74SYicong Yang 23a7112b74SYicong Yang +--------------Core 0-------+ 24a7112b74SYicong Yang | | [ PTT ] | 25a7112b74SYicong Yang | | [Root Port]---[Endpoint] 26a7112b74SYicong Yang | | [Root Port]---[Endpoint] 27a7112b74SYicong Yang | | [Root Port]---[Endpoint] 28a7112b74SYicong Yang Root Complex |------Core 1-------+ 29a7112b74SYicong Yang | | [ PTT ] | 30a7112b74SYicong Yang | | [Root Port]---[ Switch ]---[Endpoint] 31a7112b74SYicong Yang | | [Root Port]---[Endpoint] `-[Endpoint] 32a7112b74SYicong Yang | | [Root Port]---[Endpoint] 33a7112b74SYicong Yang +---------------------------+ 34a7112b74SYicong Yang 35a7112b74SYicong YangThe PTT device driver registers one PMU device for each PTT device. 36a7112b74SYicong YangThe name of each PTT device is composed of 'hisi_ptt' prefix with 37a7112b74SYicong Yangthe id of the SICL and the Core where it locates. The Kunpeng 930 38a7112b74SYicong YangSoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and 39a7112b74SYicong YangIO dies (SICL, Super I/O Cluster), where there's one PCIe Root 40a7112b74SYicong YangComplex for each SICL. 41a7112b74SYicong Yang:: 42a7112b74SYicong Yang 43a7112b74SYicong Yang /sys/devices/hisi_ptt<sicl_id>_<core_id> 44a7112b74SYicong Yang 45a7112b74SYicong YangTune 46a7112b74SYicong Yang==== 47a7112b74SYicong Yang 48a7112b74SYicong YangPTT tune is designed for monitoring and adjusting PCIe link parameters (events). 49a7112b74SYicong YangCurrently we support events in 2 classes. The scope of the events 50a7112b74SYicong Yangcovers the PCIe core to which the PTT device belongs. 51a7112b74SYicong Yang 52a7112b74SYicong YangEach event is presented as a file under $(PTT PMU dir)/tune, and 53a7112b74SYicong Yanga simple open/read/write/close cycle will be used to tune the event. 54a7112b74SYicong Yang:: 55a7112b74SYicong Yang 56a7112b74SYicong Yang $ cd /sys/devices/hisi_ptt<sicl_id>_<core_id>/tune 57a7112b74SYicong Yang $ ls 58a7112b74SYicong Yang qos_tx_cpl qos_tx_np qos_tx_p 59a7112b74SYicong Yang tx_path_rx_req_alloc_buf_level 60a7112b74SYicong Yang tx_path_tx_req_alloc_buf_level 61a7112b74SYicong Yang $ cat qos_tx_dp 62a7112b74SYicong Yang 1 63a7112b74SYicong Yang $ echo 2 > qos_tx_dp 64a7112b74SYicong Yang $ cat qos_tx_dp 65a7112b74SYicong Yang 2 66a7112b74SYicong Yang 67a7112b74SYicong YangCurrent value (numerical value) of the event can be simply read 68a7112b74SYicong Yangfrom the file, and the desired value written to the file to tune. 69a7112b74SYicong Yang 70a7112b74SYicong Yang1. Tx Path QoS Control 71a7112b74SYicong Yang------------------------ 72a7112b74SYicong Yang 73a7112b74SYicong YangThe following files are provided to tune the QoS of the tx path of 74a7112b74SYicong Yangthe PCIe core. 75a7112b74SYicong Yang 76a7112b74SYicong Yang- qos_tx_cpl: weight of Tx completion TLPs 77a7112b74SYicong Yang- qos_tx_np: weight of Tx non-posted TLPs 78a7112b74SYicong Yang- qos_tx_p: weight of Tx posted TLPs 79a7112b74SYicong Yang 80a7112b74SYicong YangThe weight influences the proportion of certain packets on the PCIe link. 81a7112b74SYicong YangFor example, for the storage scenario, increase the proportion 82a7112b74SYicong Yangof the completion packets on the link to enhance the performance as 83a7112b74SYicong Yangmore completions are consumed. 84a7112b74SYicong Yang 85a7112b74SYicong YangThe available tune data of these events is [0, 1, 2]. 86a7112b74SYicong YangWriting a negative value will return an error, and out of range 87a7112b74SYicong Yangvalues will be converted to 2. Note that the event value just 88a7112b74SYicong Yangindicates a probable level, but is not precise. 89a7112b74SYicong Yang 90a7112b74SYicong Yang2. Tx Path Buffer Control 91a7112b74SYicong Yang------------------------- 92a7112b74SYicong Yang 93a7112b74SYicong YangFollowing files are provided to tune the buffer of tx path of the PCIe core. 94a7112b74SYicong Yang 95a7112b74SYicong Yang- rx_alloc_buf_level: watermark of Rx requested 96a7112b74SYicong Yang- tx_alloc_buf_level: watermark of Tx requested 97a7112b74SYicong Yang 98a7112b74SYicong YangThese events influence the watermark of the buffer allocated for each 99a7112b74SYicong Yangtype. Rx means the inbound while Tx means outbound. The packets will 100a7112b74SYicong Yangbe stored in the buffer first and then transmitted either when the 101a7112b74SYicong Yangwatermark reached or when timed out. For a busy direction, you should 102a7112b74SYicong Yangincrease the related buffer watermark to avoid frequently posting and 103a7112b74SYicong Yangthus enhance the performance. In most cases just keep the default value. 104a7112b74SYicong Yang 105a7112b74SYicong YangThe available tune data of above events is [0, 1, 2]. 106a7112b74SYicong YangWriting a negative value will return an error, and out of range 107a7112b74SYicong Yangvalues will be converted to 2. Note that the event value just 108a7112b74SYicong Yangindicates a probable level, but is not precise. 109a7112b74SYicong Yang 110a7112b74SYicong YangTrace 111a7112b74SYicong Yang===== 112a7112b74SYicong Yang 113a7112b74SYicong YangPTT trace is designed for dumping the TLP headers to the memory, which 114a7112b74SYicong Yangcan be used to analyze the transactions and usage condition of the PCIe 115a7112b74SYicong YangLink. You can choose to filter the traced headers by either Requester ID, 116a7112b74SYicong Yangor those downstream of a set of Root Ports on the same core of the PTT 117a7112b74SYicong Yangdevice. It's also supported to trace the headers of certain type and of 118a7112b74SYicong Yangcertain direction. 119a7112b74SYicong Yang 120a7112b74SYicong YangYou can use the perf command `perf record` to set the parameters, start 121a7112b74SYicong Yangtrace and get the data. It's also supported to decode the trace 122a7112b74SYicong Yangdata with `perf report`. The control parameters for trace is inputted 123a7112b74SYicong Yangas event code for each events, which will be further illustrated later. 124a7112b74SYicong YangAn example usage is like 125a7112b74SYicong Yang:: 126a7112b74SYicong Yang 127a7112b74SYicong Yang $ perf record -e hisi_ptt0_2/filter=0x80001,type=1,direction=1, 128a7112b74SYicong Yang format=1/ -- sleep 5 129a7112b74SYicong Yang 130a7112b74SYicong YangThis will trace the TLP headers downstream root port 0000:00:10.1 (event 131a7112b74SYicong Yangcode for event 'filter' is 0x80001) with type of posted TLP requests, 132a7112b74SYicong Yangdirection of inbound and traced data format of 8DW. 133a7112b74SYicong Yang 134a7112b74SYicong Yang1. Filter 135a7112b74SYicong Yang--------- 136a7112b74SYicong Yang 137a7112b74SYicong YangThe TLP headers to trace can be filtered by the Root Ports or the Requester ID 138a7112b74SYicong Yangof the Endpoint, which are located on the same core of the PTT device. You can 139a7112b74SYicong Yangset the filter by specifying the `filter` parameter which is required to start 140a7112b74SYicong Yangthe trace. The parameter value is 20 bit. Bit 19 indicates the filter type. 141a7112b74SYicong Yang1 for Root Port filter and 0 for Requester filter. Bit[15:0] indicates the 142a7112b74SYicong Yangfilter value. The value for a Root Port is a mask of the core port id which is 143a7112b74SYicong Yangcalculated from its PCI Slot ID as (slotid & 7) * 2. The value for a Requester 144a7112b74SYicong Yangis the Requester ID (Device ID of the PCIe function). Bit[18:16] is currently 145a7112b74SYicong Yangreserved for extension. 146a7112b74SYicong Yang 147a7112b74SYicong YangFor example, if the desired filter is Endpoint function 0000:01:00.1 the filter 148a7112b74SYicong Yangvalue will be 0x00101. If the desired filter is Root Port 0000:00:10.0 then 149a7112b74SYicong Yangthen filter value is calculated as 0x80001. 150a7112b74SYicong Yang 151*6373c463SYicong YangThe driver also presents every supported Root Port and Requester filter through 152*6373c463SYicong Yangsysfs. Each filter will be an individual file with name of its related PCIe 153*6373c463SYicong Yangdevice name (domain:bus:device.function). The files of Root Port filters are 154*6373c463SYicong Yangunder $(PTT PMU dir)/root_port_filters and files of Requester filters 155*6373c463SYicong Yangare under $(PTT PMU dir)/requester_filters. 156*6373c463SYicong Yang 157a7112b74SYicong YangNote that multiple Root Ports can be specified at one time, but only one 158a7112b74SYicong YangEndpoint function can be specified in one trace. Specifying both Root Port 159a7112b74SYicong Yangand function at the same time is not supported. Driver maintains a list of 160a7112b74SYicong Yangavailable filters and will check the invalid inputs. 161a7112b74SYicong Yang 162556ef093SYicong YangThe available filters will be dynamically updated, which means you will always 163556ef093SYicong Yangget correct filter information when hotplug events happen, or when you manually 164556ef093SYicong Yangremove/rescan the devices. 165a7112b74SYicong Yang 166a7112b74SYicong Yang2. Type 167a7112b74SYicong Yang------- 168a7112b74SYicong Yang 169a7112b74SYicong YangYou can trace the TLP headers of certain types by specifying the `type` 170a7112b74SYicong Yangparameter, which is required to start the trace. The parameter value is 171a7112b74SYicong Yang8 bit. Current supported types and related values are shown below: 172a7112b74SYicong Yang 173a7112b74SYicong Yang- 8'b00000001: posted requests (P) 174a7112b74SYicong Yang- 8'b00000010: non-posted requests (NP) 175a7112b74SYicong Yang- 8'b00000100: completions (CPL) 176a7112b74SYicong Yang 177a7112b74SYicong YangYou can specify multiple types when tracing inbound TLP headers, but can only 178a7112b74SYicong Yangspecify one when tracing outbound TLP headers. 179a7112b74SYicong Yang 180a7112b74SYicong Yang3. Direction 181a7112b74SYicong Yang------------ 182a7112b74SYicong Yang 183a7112b74SYicong YangYou can trace the TLP headers from certain direction, which is relative 184a7112b74SYicong Yangto the Root Port or the PCIe core, by specifying the `direction` parameter. 185a7112b74SYicong YangThis is optional and the default parameter is inbound. The parameter value 186a7112b74SYicong Yangis 4 bit. When the desired format is 4DW, directions and related values 187a7112b74SYicong Yangsupported are shown below: 188a7112b74SYicong Yang 189a7112b74SYicong Yang- 4'b0000: inbound TLPs (P, NP, CPL) 190a7112b74SYicong Yang- 4'b0001: outbound TLPs (P, NP, CPL) 191a7112b74SYicong Yang- 4'b0010: outbound TLPs (P, NP, CPL) and inbound TLPs (P, NP, CPL B) 192a7112b74SYicong Yang- 4'b0011: outbound TLPs (P, NP, CPL) and inbound TLPs (CPL A) 193a7112b74SYicong Yang 194a7112b74SYicong YangWhen the desired format is 8DW, directions and related values supported are 195a7112b74SYicong Yangshown below: 196a7112b74SYicong Yang 197a7112b74SYicong Yang- 4'b0000: reserved 198a7112b74SYicong Yang- 4'b0001: outbound TLPs (P, NP, CPL) 199a7112b74SYicong Yang- 4'b0010: inbound TLPs (P, NP, CPL B) 200a7112b74SYicong Yang- 4'b0011: inbound TLPs (CPL A) 201a7112b74SYicong Yang 202a7112b74SYicong YangInbound completions are classified into two types: 203a7112b74SYicong Yang 204a7112b74SYicong Yang- completion A (CPL A): completion of CHI/DMA/Native non-posted requests, except for CPL B 205a7112b74SYicong Yang- completion B (CPL B): completion of DMA remote2local and P2P non-posted requests 206a7112b74SYicong Yang 207a7112b74SYicong Yang4. Format 208a7112b74SYicong Yang-------------- 209a7112b74SYicong Yang 210a7112b74SYicong YangYou can change the format of the traced TLP headers by specifying the 211a7112b74SYicong Yang`format` parameter. The default format is 4DW. The parameter value is 4 bit. 212a7112b74SYicong YangCurrent supported formats and related values are shown below: 213a7112b74SYicong Yang 214a7112b74SYicong Yang- 4'b0000: 4DW length per TLP header 215a7112b74SYicong Yang- 4'b0001: 8DW length per TLP header 216a7112b74SYicong Yang 217a7112b74SYicong YangThe traced TLP header format is different from the PCIe standard. 218a7112b74SYicong Yang 219a7112b74SYicong YangWhen using the 8DW data format, the entire TLP header is logged 220a7112b74SYicong Yang(Header DW0-3 shown below). For example, the TLP header for Memory 221a7112b74SYicong YangReads with 64-bit addresses is shown in PCIe r5.0, Figure 2-17; 222a7112b74SYicong Yangthe header for Configuration Requests is shown in Figure 2.20, etc. 223a7112b74SYicong Yang 224a7112b74SYicong YangIn addition, 8DW trace buffer entries contain a timestamp and 225a7112b74SYicong Yangpossibly a prefix for a PASID TLP prefix (see Figure 6-20, PCIe r5.0). 226a7112b74SYicong YangOtherwise this field will be all 0. 227a7112b74SYicong Yang 228a7112b74SYicong YangThe bit[31:11] of DW0 is always 0x1fffff, which can be 229a7112b74SYicong Yangused to distinguish the data format. 8DW format is like 230a7112b74SYicong Yang:: 231a7112b74SYicong Yang 232a7112b74SYicong Yang bits [ 31:11 ][ 10:0 ] 233a7112b74SYicong Yang |---------------------------------------|-------------------| 234a7112b74SYicong Yang DW0 [ 0x1fffff ][ Reserved (0x7ff) ] 235a7112b74SYicong Yang DW1 [ Prefix ] 236a7112b74SYicong Yang DW2 [ Header DW0 ] 237a7112b74SYicong Yang DW3 [ Header DW1 ] 238a7112b74SYicong Yang DW4 [ Header DW2 ] 239a7112b74SYicong Yang DW5 [ Header DW3 ] 240a7112b74SYicong Yang DW6 [ Reserved (0x0) ] 241a7112b74SYicong Yang DW7 [ Time ] 242a7112b74SYicong Yang 243a7112b74SYicong YangWhen using the 4DW data format, DW0 of the trace buffer entry 244a7112b74SYicong Yangcontains selected fields of DW0 of the TLP, together with a 245a7112b74SYicong Yangtimestamp. DW1-DW3 of the trace buffer entry contain DW1-DW3 246a7112b74SYicong Yangdirectly from the TLP header. 247a7112b74SYicong Yang 248a7112b74SYicong Yang4DW format is like 249a7112b74SYicong Yang:: 250a7112b74SYicong Yang 251a7112b74SYicong Yang bits [31:30] [ 29:25 ][24][23][22][21][ 20:11 ][ 10:0 ] 252a7112b74SYicong Yang |-----|---------|---|---|---|---|-------------|-------------| 253a7112b74SYicong Yang DW0 [ Fmt ][ Type ][T9][T8][TH][SO][ Length ][ Time ] 254a7112b74SYicong Yang DW1 [ Header DW1 ] 255a7112b74SYicong Yang DW2 [ Header DW2 ] 256a7112b74SYicong Yang DW3 [ Header DW3 ] 257a7112b74SYicong Yang 258a7112b74SYicong Yang5. Memory Management 259a7112b74SYicong Yang-------------------- 260a7112b74SYicong Yang 261a7112b74SYicong YangThe traced TLP headers will be written to the memory allocated 262a7112b74SYicong Yangby the driver. The hardware accepts 4 DMA address with same size, 263a7112b74SYicong Yangand writes the buffer sequentially like below. If DMA addr 3 is 264a7112b74SYicong Yangfinished and the trace is still on, it will return to addr 0. 265a7112b74SYicong Yang:: 266a7112b74SYicong Yang 267a7112b74SYicong Yang +->[DMA addr 0]->[DMA addr 1]->[DMA addr 2]->[DMA addr 3]-+ 268a7112b74SYicong Yang +---------------------------------------------------------+ 269a7112b74SYicong Yang 270a7112b74SYicong YangDriver will allocate each DMA buffer of 4MiB. The finished buffer 271a7112b74SYicong Yangwill be copied to the perf AUX buffer allocated by the perf core. 272a7112b74SYicong YangOnce the AUX buffer is full while the trace is still on, driver 273a7112b74SYicong Yangwill commit the AUX buffer first and then apply for a new one with 274a7112b74SYicong Yangthe same size. The size of AUX buffer is default to 16MiB. User can 275a7112b74SYicong Yangadjust the size by specifying the `-m` parameter of the perf command. 276a7112b74SYicong Yang 277a7112b74SYicong Yang6. Decoding 278a7112b74SYicong Yang----------- 279a7112b74SYicong Yang 280a7112b74SYicong YangYou can decode the traced data with `perf report -D` command (currently 281a7112b74SYicong Yangonly support to dump the raw trace data). The traced data will be decoded 282a7112b74SYicong Yangaccording to the format described previously (take 8DW as an example): 283a7112b74SYicong Yang:: 284a7112b74SYicong Yang 285a7112b74SYicong Yang [...perf headers and other information] 286a7112b74SYicong Yang . ... HISI PTT data: size 4194304 bytes 287a7112b74SYicong Yang . 00000000: 00 00 00 00 Prefix 288a7112b74SYicong Yang . 00000004: 01 00 00 60 Header DW0 289a7112b74SYicong Yang . 00000008: 0f 1e 00 01 Header DW1 290a7112b74SYicong Yang . 0000000c: 04 00 00 00 Header DW2 291a7112b74SYicong Yang . 00000010: 40 00 81 02 Header DW3 292a7112b74SYicong Yang . 00000014: 33 c0 04 00 Time 293a7112b74SYicong Yang . 00000020: 00 00 00 00 Prefix 294a7112b74SYicong Yang . 00000024: 01 00 00 60 Header DW0 295a7112b74SYicong Yang . 00000028: 0f 1e 00 01 Header DW1 296a7112b74SYicong Yang . 0000002c: 04 00 00 00 Header DW2 297a7112b74SYicong Yang . 00000030: 40 00 81 02 Header DW3 298a7112b74SYicong Yang . 00000034: 02 00 00 00 Time 299a7112b74SYicong Yang . 00000040: 00 00 00 00 Prefix 300a7112b74SYicong Yang . 00000044: 01 00 00 60 Header DW0 301a7112b74SYicong Yang . 00000048: 0f 1e 00 01 Header DW1 302a7112b74SYicong Yang . 0000004c: 04 00 00 00 Header DW2 303a7112b74SYicong Yang . 00000050: 40 00 81 02 Header DW3 304a7112b74SYicong Yang [...] 305