159809fe8SMauro Carvalho Chehab======================================================
259809fe8SMauro Carvalho ChehabHiSilicon SoC uncore Performance Monitoring Unit (PMU)
359809fe8SMauro Carvalho Chehab======================================================
459809fe8SMauro Carvalho Chehab
559809fe8SMauro Carvalho ChehabThe HiSilicon SoC chip includes various independent system device PMUs
659809fe8SMauro Carvalho Chehabsuch as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
759809fe8SMauro Carvalho Chehabindependent and have hardware logic to gather statistics and performance
859809fe8SMauro Carvalho Chehabinformation.
959809fe8SMauro Carvalho Chehab
1059809fe8SMauro Carvalho ChehabThe HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
1159809fe8SMauro Carvalho Chehab(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
1259809fe8SMauro Carvalho Chehabcalled Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
1359809fe8SMauro Carvalho Chehabtwo HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
1459809fe8SMauro Carvalho Chehab
1559809fe8SMauro Carvalho ChehabHiSilicon SoC uncore PMU driver
1659809fe8SMauro Carvalho Chehab-------------------------------
1759809fe8SMauro Carvalho Chehab
1859809fe8SMauro Carvalho ChehabEach device PMU has separate registers for event counting, control and
1959809fe8SMauro Carvalho Chehabinterrupt, and the PMU driver shall register perf PMU drivers like L3C,
2059809fe8SMauro Carvalho ChehabHHA and DDRC etc. The available events and configuration options shall
2159809fe8SMauro Carvalho Chehabbe described in the sysfs, see:
2259809fe8SMauro Carvalho Chehab
2359809fe8SMauro Carvalho Chehab/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
2459809fe8SMauro Carvalho Chehab/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
2559809fe8SMauro Carvalho ChehabThe "perf list" command shall list the available events from sysfs.
2659809fe8SMauro Carvalho Chehab
2759809fe8SMauro Carvalho ChehabEach L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
2859809fe8SMauro Carvalho Chehabname will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
2959809fe8SMauro Carvalho Chehabwhere "sccl-id" is the identifier of the SCCL and "index-id" is the index of
3059809fe8SMauro Carvalho Chehabmodule.
3159809fe8SMauro Carvalho Chehab
3259809fe8SMauro Carvalho Chehabe.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
3359809fe8SMauro Carvalho ChehabSCCL ID #3.
3459809fe8SMauro Carvalho Chehab
3559809fe8SMauro Carvalho Chehabe.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
3659809fe8SMauro Carvalho ChehabSCCL ID #1.
3759809fe8SMauro Carvalho Chehab
3859809fe8SMauro Carvalho ChehabThe driver also provides a "cpumask" sysfs attribute, which shows the CPU core
3959809fe8SMauro Carvalho ChehabID used to count the uncore PMU event.
4059809fe8SMauro Carvalho Chehab
4159809fe8SMauro Carvalho ChehabExample usage of perf::
4259809fe8SMauro Carvalho Chehab
4359809fe8SMauro Carvalho Chehab  $# perf list
4459809fe8SMauro Carvalho Chehab  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
4559809fe8SMauro Carvalho Chehab  ------------------------------------------
4659809fe8SMauro Carvalho Chehab  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
4759809fe8SMauro Carvalho Chehab  ------------------------------------------
4859809fe8SMauro Carvalho Chehab  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
4959809fe8SMauro Carvalho Chehab  ------------------------------------------
5059809fe8SMauro Carvalho Chehab  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
5159809fe8SMauro Carvalho Chehab  ------------------------------------------
5259809fe8SMauro Carvalho Chehab
5359809fe8SMauro Carvalho Chehab  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
5459809fe8SMauro Carvalho Chehab  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
5559809fe8SMauro Carvalho Chehab
569b86b1b4SShaokun ZhangFor HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
579b86b1b4SShaokun Zhangas PMU v1, but some new functions are added to the hardware.
589b86b1b4SShaokun Zhang
59*dea0f414SWill Deacon1. L3C PMU supports filtering by core/thread within the cluster which can be
60b88f5e97SQi Liuspecified as a bitmap::
61b88f5e97SQi Liu
629b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
63b88f5e97SQi Liu
649b86b1b4SShaokun ZhangThis will only count the operations from core/thread 0 and 1 in this cluster.
659b86b1b4SShaokun Zhang
66*dea0f414SWill Deacon2. Tracetag allow the user to chose to count only read, write or atomic
679b86b1b4SShaokun Zhangoperations via the tt_req parameeter in perf. The default value counts all
689b86b1b4SShaokun Zhangoperations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
699b86b1b4SShaokun Zhangrepresents write operations, 3'b110 represents atomic store operations and
70b88f5e97SQi Liu3'b111 represents atomic non-store operations, other values are reserved::
71b88f5e97SQi Liu
729b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
73b88f5e97SQi Liu
749b86b1b4SShaokun ZhangThis will only count the read operations in this cluster.
759b86b1b4SShaokun Zhang
76*dea0f414SWill Deacon3. Datasrc allows the user to check where the data comes from. It is 5 bits.
779b86b1b4SShaokun ZhangSome important codes are as follows:
78*dea0f414SWill Deacon
79*dea0f414SWill Deacon- 5'b00001: comes from L3C in this die;
80*dea0f414SWill Deacon- 5'b01000: comes from L3C in the cross-die;
81*dea0f414SWill Deacon- 5'b01001: comes from L3C which is in another socket;
82*dea0f414SWill Deacon- 5'b01110: comes from the local DDR;
83*dea0f414SWill Deacon- 5'b01111: comes from the cross-die DDR;
84*dea0f414SWill Deacon- 5'b10000: comes from cross-socket DDR;
85*dea0f414SWill Deacon
869b86b1b4SShaokun Zhangetc, it is mainly helpful to find that the data source is nearest from the CPU
879b86b1b4SShaokun Zhangcores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
88b88f5e97SQi Liuconfigured in perf command::
89b88f5e97SQi Liu
909b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
919b86b1b4SShaokun Zhang  hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
929b86b1b4SShaokun Zhang
93*dea0f414SWill Deacon4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
949b86b1b4SShaokun Zhangcontains several Compute Clusters (CCLs). The I/O dies are called Super I/O
959b86b1b4SShaokun Zhangclusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
969b86b1b4SShaokun ZhangSoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
979b86b1b4SShaokun ZhangCCL/ICL-ID. For I/O die, the ICL-ID is followed by:
989b86b1b4SShaokun Zhang
99*dea0f414SWill Deacon- 5'b00000: I/O_MGMT_ICL;
100*dea0f414SWill Deacon- 5'b00001: Network_ICL;
101*dea0f414SWill Deacon- 5'b00011: HAC_ICL;
102*dea0f414SWill Deacon- 5'b10000: PCIe_ICL;
103*dea0f414SWill Deacon
104*dea0f414SWill Deacon5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
105ea8d1c06SJunhao Heuring channel. It is 2 bits. Some important codes are as follows:
106*dea0f414SWill Deacon
107*dea0f414SWill Deacon- 2'b11: count the events which sent to the uring_ext (MATA) channel;
108*dea0f414SWill Deacon- 2'b01: is the same as 2'b11;
109*dea0f414SWill Deacon- 2'b10: count the events which sent to the uring (non-MATA) channel;
110*dea0f414SWill Deacon- 2'b00: default value, count the events which sent to the both uring and
111ea8d1c06SJunhao He  uring_ext channel;
112ea8d1c06SJunhao He
1139b86b1b4SShaokun ZhangUsers could configure IDs to count data come from specific CCL/ICL, by setting
1149b86b1b4SShaokun Zhangsrcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
1159b86b1b4SShaokun Zhangtgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
1169b86b1b4SShaokun Zhangcheck the bit when matching against the srcid_cmd/tgtid_cmd.
1179b86b1b4SShaokun Zhang
1189b86b1b4SShaokun ZhangIf all of these options are disabled, it can works by the default value that
1199b86b1b4SShaokun Zhangdoesn't distinguish the filter condition and ID information and will return
1209b86b1b4SShaokun Zhangthe total counter values in the PMU counters.
1219b86b1b4SShaokun Zhang
12259809fe8SMauro Carvalho ChehabThe current driver does not support sampling. So "perf record" is unsupported.
12359809fe8SMauro Carvalho ChehabAlso attach to a task is unsupported as the events are all uncore.
12459809fe8SMauro Carvalho Chehab
12559809fe8SMauro Carvalho ChehabNote: Please contact the maintainer for a complete list of events supported for
12659809fe8SMauro Carvalho Chehabthe PMU devices in the SoC and its information if needed.
127