159809fe8SMauro Carvalho Chehab====================================================== 259809fe8SMauro Carvalho ChehabHiSilicon SoC uncore Performance Monitoring Unit (PMU) 359809fe8SMauro Carvalho Chehab====================================================== 459809fe8SMauro Carvalho Chehab 559809fe8SMauro Carvalho ChehabThe HiSilicon SoC chip includes various independent system device PMUs 659809fe8SMauro Carvalho Chehabsuch as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are 759809fe8SMauro Carvalho Chehabindependent and have hardware logic to gather statistics and performance 859809fe8SMauro Carvalho Chehabinformation. 959809fe8SMauro Carvalho Chehab 1059809fe8SMauro Carvalho ChehabThe HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster 1159809fe8SMauro Carvalho Chehab(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is 1259809fe8SMauro Carvalho Chehabcalled Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has 1359809fe8SMauro Carvalho Chehabtwo HHAs (0 - 1) and four DDRCs (0 - 3), respectively. 1459809fe8SMauro Carvalho Chehab 1559809fe8SMauro Carvalho ChehabHiSilicon SoC uncore PMU driver 1659809fe8SMauro Carvalho Chehab------------------------------- 1759809fe8SMauro Carvalho Chehab 1859809fe8SMauro Carvalho ChehabEach device PMU has separate registers for event counting, control and 1959809fe8SMauro Carvalho Chehabinterrupt, and the PMU driver shall register perf PMU drivers like L3C, 2059809fe8SMauro Carvalho ChehabHHA and DDRC etc. The available events and configuration options shall 2159809fe8SMauro Carvalho Chehabbe described in the sysfs, see: 2259809fe8SMauro Carvalho Chehab 2359809fe8SMauro Carvalho Chehab/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or 2459809fe8SMauro Carvalho Chehab/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>. 2559809fe8SMauro Carvalho ChehabThe "perf list" command shall list the available events from sysfs. 2659809fe8SMauro Carvalho Chehab 2759809fe8SMauro Carvalho ChehabEach L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU 2859809fe8SMauro Carvalho Chehabname will appear in event listing as hisi_sccl<sccl-id>_module<index-id>. 2959809fe8SMauro Carvalho Chehabwhere "sccl-id" is the identifier of the SCCL and "index-id" is the index of 3059809fe8SMauro Carvalho Chehabmodule. 3159809fe8SMauro Carvalho Chehab 3259809fe8SMauro Carvalho Chehabe.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in 3359809fe8SMauro Carvalho ChehabSCCL ID #3. 3459809fe8SMauro Carvalho Chehab 3559809fe8SMauro Carvalho Chehabe.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in 3659809fe8SMauro Carvalho ChehabSCCL ID #1. 3759809fe8SMauro Carvalho Chehab 3859809fe8SMauro Carvalho ChehabThe driver also provides a "cpumask" sysfs attribute, which shows the CPU core 3959809fe8SMauro Carvalho ChehabID used to count the uncore PMU event. 4059809fe8SMauro Carvalho Chehab 4159809fe8SMauro Carvalho ChehabExample usage of perf:: 4259809fe8SMauro Carvalho Chehab 4359809fe8SMauro Carvalho Chehab $# perf list 4459809fe8SMauro Carvalho Chehab hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event] 4559809fe8SMauro Carvalho Chehab ------------------------------------------ 4659809fe8SMauro Carvalho Chehab hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event] 4759809fe8SMauro Carvalho Chehab ------------------------------------------ 4859809fe8SMauro Carvalho Chehab hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event] 4959809fe8SMauro Carvalho Chehab ------------------------------------------ 5059809fe8SMauro Carvalho Chehab hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event] 5159809fe8SMauro Carvalho Chehab ------------------------------------------ 5259809fe8SMauro Carvalho Chehab 5359809fe8SMauro Carvalho Chehab $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5 5459809fe8SMauro Carvalho Chehab $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5 5559809fe8SMauro Carvalho Chehab 56*9b86b1b4SShaokun ZhangFor HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same 57*9b86b1b4SShaokun Zhangas PMU v1, but some new functions are added to the hardware. 58*9b86b1b4SShaokun Zhang 59*9b86b1b4SShaokun Zhang(a) L3C PMU supports filtering by core/thread within the cluster which can be 60*9b86b1b4SShaokun Zhangspecified as a bitmap. 61*9b86b1b4SShaokun Zhang $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5 62*9b86b1b4SShaokun ZhangThis will only count the operations from core/thread 0 and 1 in this cluster. 63*9b86b1b4SShaokun Zhang 64*9b86b1b4SShaokun Zhang(b) Tracetag allow the user to chose to count only read, write or atomic 65*9b86b1b4SShaokun Zhangoperations via the tt_req parameeter in perf. The default value counts all 66*9b86b1b4SShaokun Zhangoperations. tt_req is 3bits, 3'b100 represents read operations, 3'b101 67*9b86b1b4SShaokun Zhangrepresents write operations, 3'b110 represents atomic store operations and 68*9b86b1b4SShaokun Zhang3'b111 represents atomic non-store operations, other values are reserved. 69*9b86b1b4SShaokun Zhang $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5 70*9b86b1b4SShaokun ZhangThis will only count the read operations in this cluster. 71*9b86b1b4SShaokun Zhang 72*9b86b1b4SShaokun Zhang(c) Datasrc allows the user to check where the data comes from. It is 5 bits. 73*9b86b1b4SShaokun ZhangSome important codes are as follows: 74*9b86b1b4SShaokun Zhang5'b00001: comes from L3C in this die; 75*9b86b1b4SShaokun Zhang5'b01000: comes from L3C in the cross-die; 76*9b86b1b4SShaokun Zhang5'b01001: comes from L3C which is in another socket; 77*9b86b1b4SShaokun Zhang5'b01110: comes from the local DDR; 78*9b86b1b4SShaokun Zhang5'b01111: comes from the cross-die DDR; 79*9b86b1b4SShaokun Zhang5'b10000: comes from cross-socket DDR; 80*9b86b1b4SShaokun Zhangetc, it is mainly helpful to find that the data source is nearest from the CPU 81*9b86b1b4SShaokun Zhangcores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be 82*9b86b1b4SShaokun Zhangconfigured in perf command. 83*9b86b1b4SShaokun Zhang $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/, 84*9b86b1b4SShaokun Zhang hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5 85*9b86b1b4SShaokun Zhang 86*9b86b1b4SShaokun Zhang(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die 87*9b86b1b4SShaokun Zhangcontains several Compute Clusters (CCLs). The I/O dies are called Super I/O 88*9b86b1b4SShaokun Zhangclusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the 89*9b86b1b4SShaokun ZhangSoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit 90*9b86b1b4SShaokun ZhangCCL/ICL-ID. For I/O die, the ICL-ID is followed by: 91*9b86b1b4SShaokun Zhang5'b00000: I/O_MGMT_ICL; 92*9b86b1b4SShaokun Zhang5'b00001: Network_ICL; 93*9b86b1b4SShaokun Zhang5'b00011: HAC_ICL; 94*9b86b1b4SShaokun Zhang5'b10000: PCIe_ICL; 95*9b86b1b4SShaokun Zhang 96*9b86b1b4SShaokun ZhangUsers could configure IDs to count data come from specific CCL/ICL, by setting 97*9b86b1b4SShaokun Zhangsrcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting 98*9b86b1b4SShaokun Zhangtgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not 99*9b86b1b4SShaokun Zhangcheck the bit when matching against the srcid_cmd/tgtid_cmd. 100*9b86b1b4SShaokun Zhang 101*9b86b1b4SShaokun ZhangIf all of these options are disabled, it can works by the default value that 102*9b86b1b4SShaokun Zhangdoesn't distinguish the filter condition and ID information and will return 103*9b86b1b4SShaokun Zhangthe total counter values in the PMU counters. 104*9b86b1b4SShaokun Zhang 10559809fe8SMauro Carvalho ChehabThe current driver does not support sampling. So "perf record" is unsupported. 10659809fe8SMauro Carvalho ChehabAlso attach to a task is unsupported as the events are all uncore. 10759809fe8SMauro Carvalho Chehab 10859809fe8SMauro Carvalho ChehabNote: Please contact the maintainer for a complete list of events supported for 10959809fe8SMauro Carvalho Chehabthe PMU devices in the SoC and its information if needed. 110