159809fe8SMauro Carvalho Chehab======================================================
259809fe8SMauro Carvalho ChehabHiSilicon SoC uncore Performance Monitoring Unit (PMU)
359809fe8SMauro Carvalho Chehab======================================================
459809fe8SMauro Carvalho Chehab
559809fe8SMauro Carvalho ChehabThe HiSilicon SoC chip includes various independent system device PMUs
659809fe8SMauro Carvalho Chehabsuch as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
759809fe8SMauro Carvalho Chehabindependent and have hardware logic to gather statistics and performance
859809fe8SMauro Carvalho Chehabinformation.
959809fe8SMauro Carvalho Chehab
1059809fe8SMauro Carvalho ChehabThe HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
1159809fe8SMauro Carvalho Chehab(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
1259809fe8SMauro Carvalho Chehabcalled Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
1359809fe8SMauro Carvalho Chehabtwo HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
1459809fe8SMauro Carvalho Chehab
1559809fe8SMauro Carvalho ChehabHiSilicon SoC uncore PMU driver
1659809fe8SMauro Carvalho Chehab-------------------------------
1759809fe8SMauro Carvalho Chehab
1859809fe8SMauro Carvalho ChehabEach device PMU has separate registers for event counting, control and
1959809fe8SMauro Carvalho Chehabinterrupt, and the PMU driver shall register perf PMU drivers like L3C,
2059809fe8SMauro Carvalho ChehabHHA and DDRC etc. The available events and configuration options shall
2159809fe8SMauro Carvalho Chehabbe described in the sysfs, see:
2259809fe8SMauro Carvalho Chehab
2359809fe8SMauro Carvalho Chehab/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
2459809fe8SMauro Carvalho Chehab/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
2559809fe8SMauro Carvalho ChehabThe "perf list" command shall list the available events from sysfs.
2659809fe8SMauro Carvalho Chehab
2759809fe8SMauro Carvalho ChehabEach L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
2859809fe8SMauro Carvalho Chehabname will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
2959809fe8SMauro Carvalho Chehabwhere "sccl-id" is the identifier of the SCCL and "index-id" is the index of
3059809fe8SMauro Carvalho Chehabmodule.
3159809fe8SMauro Carvalho Chehab
3259809fe8SMauro Carvalho Chehabe.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
3359809fe8SMauro Carvalho ChehabSCCL ID #3.
3459809fe8SMauro Carvalho Chehab
3559809fe8SMauro Carvalho Chehabe.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
3659809fe8SMauro Carvalho ChehabSCCL ID #1.
3759809fe8SMauro Carvalho Chehab
3859809fe8SMauro Carvalho ChehabThe driver also provides a "cpumask" sysfs attribute, which shows the CPU core
3959809fe8SMauro Carvalho ChehabID used to count the uncore PMU event.
4059809fe8SMauro Carvalho Chehab
4159809fe8SMauro Carvalho ChehabExample usage of perf::
4259809fe8SMauro Carvalho Chehab
4359809fe8SMauro Carvalho Chehab  $# perf list
4459809fe8SMauro Carvalho Chehab  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
4559809fe8SMauro Carvalho Chehab  ------------------------------------------
4659809fe8SMauro Carvalho Chehab  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
4759809fe8SMauro Carvalho Chehab  ------------------------------------------
4859809fe8SMauro Carvalho Chehab  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
4959809fe8SMauro Carvalho Chehab  ------------------------------------------
5059809fe8SMauro Carvalho Chehab  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
5159809fe8SMauro Carvalho Chehab  ------------------------------------------
5259809fe8SMauro Carvalho Chehab
5359809fe8SMauro Carvalho Chehab  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
5459809fe8SMauro Carvalho Chehab  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
5559809fe8SMauro Carvalho Chehab
569b86b1b4SShaokun ZhangFor HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
579b86b1b4SShaokun Zhangas PMU v1, but some new functions are added to the hardware.
589b86b1b4SShaokun Zhang
599b86b1b4SShaokun Zhang(a) L3C PMU supports filtering by core/thread within the cluster which can be
60b88f5e97SQi Liuspecified as a bitmap::
61b88f5e97SQi Liu
629b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
63b88f5e97SQi Liu
649b86b1b4SShaokun ZhangThis will only count the operations from core/thread 0 and 1 in this cluster.
659b86b1b4SShaokun Zhang
669b86b1b4SShaokun Zhang(b) Tracetag allow the user to chose to count only read, write or atomic
679b86b1b4SShaokun Zhangoperations via the tt_req parameeter in perf. The default value counts all
689b86b1b4SShaokun Zhangoperations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
699b86b1b4SShaokun Zhangrepresents write operations, 3'b110 represents atomic store operations and
70b88f5e97SQi Liu3'b111 represents atomic non-store operations, other values are reserved::
71b88f5e97SQi Liu
729b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
73b88f5e97SQi Liu
749b86b1b4SShaokun ZhangThis will only count the read operations in this cluster.
759b86b1b4SShaokun Zhang
769b86b1b4SShaokun Zhang(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
779b86b1b4SShaokun ZhangSome important codes are as follows:
789b86b1b4SShaokun Zhang5'b00001: comes from L3C in this die;
799b86b1b4SShaokun Zhang5'b01000: comes from L3C in the cross-die;
809b86b1b4SShaokun Zhang5'b01001: comes from L3C which is in another socket;
819b86b1b4SShaokun Zhang5'b01110: comes from the local DDR;
829b86b1b4SShaokun Zhang5'b01111: comes from the cross-die DDR;
839b86b1b4SShaokun Zhang5'b10000: comes from cross-socket DDR;
849b86b1b4SShaokun Zhangetc, it is mainly helpful to find that the data source is nearest from the CPU
859b86b1b4SShaokun Zhangcores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
86b88f5e97SQi Liuconfigured in perf command::
87b88f5e97SQi Liu
889b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
899b86b1b4SShaokun Zhang  hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
909b86b1b4SShaokun Zhang
919b86b1b4SShaokun Zhang(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
929b86b1b4SShaokun Zhangcontains several Compute Clusters (CCLs). The I/O dies are called Super I/O
939b86b1b4SShaokun Zhangclusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
949b86b1b4SShaokun ZhangSoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
959b86b1b4SShaokun ZhangCCL/ICL-ID. For I/O die, the ICL-ID is followed by:
969b86b1b4SShaokun Zhang5'b00000: I/O_MGMT_ICL;
979b86b1b4SShaokun Zhang5'b00001: Network_ICL;
989b86b1b4SShaokun Zhang5'b00011: HAC_ICL;
999b86b1b4SShaokun Zhang5'b10000: PCIe_ICL;
1009b86b1b4SShaokun Zhang
101*ea8d1c06SJunhao He(e) uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
102*ea8d1c06SJunhao Heuring channel. It is 2 bits. Some important codes are as follows:
103*ea8d1c06SJunhao He2'b11: count the events which sent to the uring_ext (MATA) channel;
104*ea8d1c06SJunhao He2'b01: is the same as 2'b11;
105*ea8d1c06SJunhao He2'b10: count the events which sent to the uring (non-MATA) channel;
106*ea8d1c06SJunhao He2'b00: default value, count the events which sent to the both uring and
107*ea8d1c06SJunhao He       uring_ext channel;
108*ea8d1c06SJunhao He
1099b86b1b4SShaokun ZhangUsers could configure IDs to count data come from specific CCL/ICL, by setting
1109b86b1b4SShaokun Zhangsrcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
1119b86b1b4SShaokun Zhangtgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
1129b86b1b4SShaokun Zhangcheck the bit when matching against the srcid_cmd/tgtid_cmd.
1139b86b1b4SShaokun Zhang
1149b86b1b4SShaokun ZhangIf all of these options are disabled, it can works by the default value that
1159b86b1b4SShaokun Zhangdoesn't distinguish the filter condition and ID information and will return
1169b86b1b4SShaokun Zhangthe total counter values in the PMU counters.
1179b86b1b4SShaokun Zhang
11859809fe8SMauro Carvalho ChehabThe current driver does not support sampling. So "perf record" is unsupported.
11959809fe8SMauro Carvalho ChehabAlso attach to a task is unsupported as the events are all uncore.
12059809fe8SMauro Carvalho Chehab
12159809fe8SMauro Carvalho ChehabNote: Please contact the maintainer for a complete list of events supported for
12259809fe8SMauro Carvalho Chehabthe PMU devices in the SoC and its information if needed.
123