159809fe8SMauro Carvalho Chehab====================================================== 259809fe8SMauro Carvalho ChehabHiSilicon SoC uncore Performance Monitoring Unit (PMU) 359809fe8SMauro Carvalho Chehab====================================================== 459809fe8SMauro Carvalho Chehab 559809fe8SMauro Carvalho ChehabThe HiSilicon SoC chip includes various independent system device PMUs 659809fe8SMauro Carvalho Chehabsuch as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are 759809fe8SMauro Carvalho Chehabindependent and have hardware logic to gather statistics and performance 859809fe8SMauro Carvalho Chehabinformation. 959809fe8SMauro Carvalho Chehab 1059809fe8SMauro Carvalho ChehabThe HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster 1159809fe8SMauro Carvalho Chehab(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is 1259809fe8SMauro Carvalho Chehabcalled Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has 1359809fe8SMauro Carvalho Chehabtwo HHAs (0 - 1) and four DDRCs (0 - 3), respectively. 1459809fe8SMauro Carvalho Chehab 1559809fe8SMauro Carvalho ChehabHiSilicon SoC uncore PMU driver 1659809fe8SMauro Carvalho Chehab------------------------------- 1759809fe8SMauro Carvalho Chehab 1859809fe8SMauro Carvalho ChehabEach device PMU has separate registers for event counting, control and 1959809fe8SMauro Carvalho Chehabinterrupt, and the PMU driver shall register perf PMU drivers like L3C, 2059809fe8SMauro Carvalho ChehabHHA and DDRC etc. The available events and configuration options shall 2159809fe8SMauro Carvalho Chehabbe described in the sysfs, see: 2259809fe8SMauro Carvalho Chehab 2359809fe8SMauro Carvalho Chehab/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or 2459809fe8SMauro Carvalho Chehab/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>. 2559809fe8SMauro Carvalho ChehabThe "perf list" command shall list the available events from sysfs. 2659809fe8SMauro Carvalho Chehab 2759809fe8SMauro Carvalho ChehabEach L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU 2859809fe8SMauro Carvalho Chehabname will appear in event listing as hisi_sccl<sccl-id>_module<index-id>. 2959809fe8SMauro Carvalho Chehabwhere "sccl-id" is the identifier of the SCCL and "index-id" is the index of 3059809fe8SMauro Carvalho Chehabmodule. 3159809fe8SMauro Carvalho Chehab 3259809fe8SMauro Carvalho Chehabe.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in 3359809fe8SMauro Carvalho ChehabSCCL ID #3. 3459809fe8SMauro Carvalho Chehab 3559809fe8SMauro Carvalho Chehabe.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in 3659809fe8SMauro Carvalho ChehabSCCL ID #1. 3759809fe8SMauro Carvalho Chehab 3859809fe8SMauro Carvalho ChehabThe driver also provides a "cpumask" sysfs attribute, which shows the CPU core 3959809fe8SMauro Carvalho ChehabID used to count the uncore PMU event. 4059809fe8SMauro Carvalho Chehab 4159809fe8SMauro Carvalho ChehabExample usage of perf:: 4259809fe8SMauro Carvalho Chehab 4359809fe8SMauro Carvalho Chehab $# perf list 4459809fe8SMauro Carvalho Chehab hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event] 4559809fe8SMauro Carvalho Chehab ------------------------------------------ 4659809fe8SMauro Carvalho Chehab hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event] 4759809fe8SMauro Carvalho Chehab ------------------------------------------ 4859809fe8SMauro Carvalho Chehab hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event] 4959809fe8SMauro Carvalho Chehab ------------------------------------------ 5059809fe8SMauro Carvalho Chehab hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event] 5159809fe8SMauro Carvalho Chehab ------------------------------------------ 5259809fe8SMauro Carvalho Chehab 5359809fe8SMauro Carvalho Chehab $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5 5459809fe8SMauro Carvalho Chehab $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5 5559809fe8SMauro Carvalho Chehab 569b86b1b4SShaokun ZhangFor HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same 579b86b1b4SShaokun Zhangas PMU v1, but some new functions are added to the hardware. 589b86b1b4SShaokun Zhang 59*dea0f414SWill Deacon1. L3C PMU supports filtering by core/thread within the cluster which can be 60b88f5e97SQi Liuspecified as a bitmap:: 61b88f5e97SQi Liu 629b86b1b4SShaokun Zhang $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5 63b88f5e97SQi Liu 649b86b1b4SShaokun ZhangThis will only count the operations from core/thread 0 and 1 in this cluster. 659b86b1b4SShaokun Zhang 66*dea0f414SWill Deacon2. Tracetag allow the user to chose to count only read, write or atomic 679b86b1b4SShaokun Zhangoperations via the tt_req parameeter in perf. The default value counts all 689b86b1b4SShaokun Zhangoperations. tt_req is 3bits, 3'b100 represents read operations, 3'b101 699b86b1b4SShaokun Zhangrepresents write operations, 3'b110 represents atomic store operations and 70b88f5e97SQi Liu3'b111 represents atomic non-store operations, other values are reserved:: 71b88f5e97SQi Liu 729b86b1b4SShaokun Zhang $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5 73b88f5e97SQi Liu 749b86b1b4SShaokun ZhangThis will only count the read operations in this cluster. 759b86b1b4SShaokun Zhang 76*dea0f414SWill Deacon3. Datasrc allows the user to check where the data comes from. It is 5 bits. 779b86b1b4SShaokun ZhangSome important codes are as follows: 78*dea0f414SWill Deacon 79*dea0f414SWill Deacon- 5'b00001: comes from L3C in this die; 80*dea0f414SWill Deacon- 5'b01000: comes from L3C in the cross-die; 81*dea0f414SWill Deacon- 5'b01001: comes from L3C which is in another socket; 82*dea0f414SWill Deacon- 5'b01110: comes from the local DDR; 83*dea0f414SWill Deacon- 5'b01111: comes from the cross-die DDR; 84*dea0f414SWill Deacon- 5'b10000: comes from cross-socket DDR; 85*dea0f414SWill Deacon 869b86b1b4SShaokun Zhangetc, it is mainly helpful to find that the data source is nearest from the CPU 879b86b1b4SShaokun Zhangcores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be 88b88f5e97SQi Liuconfigured in perf command:: 89b88f5e97SQi Liu 909b86b1b4SShaokun Zhang $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/, 919b86b1b4SShaokun Zhang hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5 929b86b1b4SShaokun Zhang 93*dea0f414SWill Deacon4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die 949b86b1b4SShaokun Zhangcontains several Compute Clusters (CCLs). The I/O dies are called Super I/O 959b86b1b4SShaokun Zhangclusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the 969b86b1b4SShaokun ZhangSoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit 979b86b1b4SShaokun ZhangCCL/ICL-ID. For I/O die, the ICL-ID is followed by: 989b86b1b4SShaokun Zhang 99*dea0f414SWill Deacon- 5'b00000: I/O_MGMT_ICL; 100*dea0f414SWill Deacon- 5'b00001: Network_ICL; 101*dea0f414SWill Deacon- 5'b00011: HAC_ICL; 102*dea0f414SWill Deacon- 5'b10000: PCIe_ICL; 103*dea0f414SWill Deacon 104*dea0f414SWill Deacon5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request 105ea8d1c06SJunhao Heuring channel. It is 2 bits. Some important codes are as follows: 106*dea0f414SWill Deacon 107*dea0f414SWill Deacon- 2'b11: count the events which sent to the uring_ext (MATA) channel; 108*dea0f414SWill Deacon- 2'b01: is the same as 2'b11; 109*dea0f414SWill Deacon- 2'b10: count the events which sent to the uring (non-MATA) channel; 110*dea0f414SWill Deacon- 2'b00: default value, count the events which sent to the both uring and 111ea8d1c06SJunhao He uring_ext channel; 112ea8d1c06SJunhao He 1139b86b1b4SShaokun ZhangUsers could configure IDs to count data come from specific CCL/ICL, by setting 1149b86b1b4SShaokun Zhangsrcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting 1159b86b1b4SShaokun Zhangtgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not 1169b86b1b4SShaokun Zhangcheck the bit when matching against the srcid_cmd/tgtid_cmd. 1179b86b1b4SShaokun Zhang 1189b86b1b4SShaokun ZhangIf all of these options are disabled, it can works by the default value that 1199b86b1b4SShaokun Zhangdoesn't distinguish the filter condition and ID information and will return 1209b86b1b4SShaokun Zhangthe total counter values in the PMU counters. 1219b86b1b4SShaokun Zhang 12259809fe8SMauro Carvalho ChehabThe current driver does not support sampling. So "perf record" is unsupported. 12359809fe8SMauro Carvalho ChehabAlso attach to a task is unsupported as the events are all uncore. 12459809fe8SMauro Carvalho Chehab 12559809fe8SMauro Carvalho ChehabNote: Please contact the maintainer for a complete list of events supported for 12659809fe8SMauro Carvalho Chehabthe PMU devices in the SoC and its information if needed. 127