159809fe8SMauro Carvalho Chehab======================================================
259809fe8SMauro Carvalho ChehabHiSilicon SoC uncore Performance Monitoring Unit (PMU)
359809fe8SMauro Carvalho Chehab======================================================
459809fe8SMauro Carvalho Chehab
559809fe8SMauro Carvalho ChehabThe HiSilicon SoC chip includes various independent system device PMUs
659809fe8SMauro Carvalho Chehabsuch as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
759809fe8SMauro Carvalho Chehabindependent and have hardware logic to gather statistics and performance
859809fe8SMauro Carvalho Chehabinformation.
959809fe8SMauro Carvalho Chehab
1059809fe8SMauro Carvalho ChehabThe HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster
1159809fe8SMauro Carvalho Chehab(CCL) is made up of 4 cpu cores sharing one L3 cache; each CPU die is
1259809fe8SMauro Carvalho Chehabcalled Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has
1359809fe8SMauro Carvalho Chehabtwo HHAs (0 - 1) and four DDRCs (0 - 3), respectively.
1459809fe8SMauro Carvalho Chehab
1559809fe8SMauro Carvalho ChehabHiSilicon SoC uncore PMU driver
1659809fe8SMauro Carvalho Chehab-------------------------------
1759809fe8SMauro Carvalho Chehab
1859809fe8SMauro Carvalho ChehabEach device PMU has separate registers for event counting, control and
1959809fe8SMauro Carvalho Chehabinterrupt, and the PMU driver shall register perf PMU drivers like L3C,
2059809fe8SMauro Carvalho ChehabHHA and DDRC etc. The available events and configuration options shall
2159809fe8SMauro Carvalho Chehabbe described in the sysfs, see:
2259809fe8SMauro Carvalho Chehab
2359809fe8SMauro Carvalho Chehab/sys/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>/, or
2459809fe8SMauro Carvalho Chehab/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
2559809fe8SMauro Carvalho ChehabThe "perf list" command shall list the available events from sysfs.
2659809fe8SMauro Carvalho Chehab
2759809fe8SMauro Carvalho ChehabEach L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
2859809fe8SMauro Carvalho Chehabname will appear in event listing as hisi_sccl<sccl-id>_module<index-id>.
2959809fe8SMauro Carvalho Chehabwhere "sccl-id" is the identifier of the SCCL and "index-id" is the index of
3059809fe8SMauro Carvalho Chehabmodule.
3159809fe8SMauro Carvalho Chehab
3259809fe8SMauro Carvalho Chehabe.g. hisi_sccl3_l3c0/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 in
3359809fe8SMauro Carvalho ChehabSCCL ID #3.
3459809fe8SMauro Carvalho Chehab
3559809fe8SMauro Carvalho Chehabe.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
3659809fe8SMauro Carvalho ChehabSCCL ID #1.
3759809fe8SMauro Carvalho Chehab
3859809fe8SMauro Carvalho ChehabThe driver also provides a "cpumask" sysfs attribute, which shows the CPU core
3959809fe8SMauro Carvalho ChehabID used to count the uncore PMU event.
4059809fe8SMauro Carvalho Chehab
4159809fe8SMauro Carvalho ChehabExample usage of perf::
4259809fe8SMauro Carvalho Chehab
4359809fe8SMauro Carvalho Chehab  $# perf list
4459809fe8SMauro Carvalho Chehab  hisi_sccl3_l3c0/rd_hit_cpipe/ [kernel PMU event]
4559809fe8SMauro Carvalho Chehab  ------------------------------------------
4659809fe8SMauro Carvalho Chehab  hisi_sccl3_l3c0/wr_hit_cpipe/ [kernel PMU event]
4759809fe8SMauro Carvalho Chehab  ------------------------------------------
4859809fe8SMauro Carvalho Chehab  hisi_sccl1_l3c0/rd_hit_cpipe/ [kernel PMU event]
4959809fe8SMauro Carvalho Chehab  ------------------------------------------
5059809fe8SMauro Carvalho Chehab  hisi_sccl1_l3c0/wr_hit_cpipe/ [kernel PMU event]
5159809fe8SMauro Carvalho Chehab  ------------------------------------------
5259809fe8SMauro Carvalho Chehab
5359809fe8SMauro Carvalho Chehab  $# perf stat -a -e hisi_sccl3_l3c0/rd_hit_cpipe/ sleep 5
5459809fe8SMauro Carvalho Chehab  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02/ sleep 5
5559809fe8SMauro Carvalho Chehab
56*9b86b1b4SShaokun ZhangFor HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
57*9b86b1b4SShaokun Zhangas PMU v1, but some new functions are added to the hardware.
58*9b86b1b4SShaokun Zhang
59*9b86b1b4SShaokun Zhang(a) L3C PMU supports filtering by core/thread within the cluster which can be
60*9b86b1b4SShaokun Zhangspecified as a bitmap.
61*9b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
62*9b86b1b4SShaokun ZhangThis will only count the operations from core/thread 0 and 1 in this cluster.
63*9b86b1b4SShaokun Zhang
64*9b86b1b4SShaokun Zhang(b) Tracetag allow the user to chose to count only read, write or atomic
65*9b86b1b4SShaokun Zhangoperations via the tt_req parameeter in perf. The default value counts all
66*9b86b1b4SShaokun Zhangoperations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
67*9b86b1b4SShaokun Zhangrepresents write operations, 3'b110 represents atomic store operations and
68*9b86b1b4SShaokun Zhang3'b111 represents atomic non-store operations, other values are reserved.
69*9b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_req=0x4/ sleep 5
70*9b86b1b4SShaokun ZhangThis will only count the read operations in this cluster.
71*9b86b1b4SShaokun Zhang
72*9b86b1b4SShaokun Zhang(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
73*9b86b1b4SShaokun ZhangSome important codes are as follows:
74*9b86b1b4SShaokun Zhang5'b00001: comes from L3C in this die;
75*9b86b1b4SShaokun Zhang5'b01000: comes from L3C in the cross-die;
76*9b86b1b4SShaokun Zhang5'b01001: comes from L3C which is in another socket;
77*9b86b1b4SShaokun Zhang5'b01110: comes from the local DDR;
78*9b86b1b4SShaokun Zhang5'b01111: comes from the cross-die DDR;
79*9b86b1b4SShaokun Zhang5'b10000: comes from cross-socket DDR;
80*9b86b1b4SShaokun Zhangetc, it is mainly helpful to find that the data source is nearest from the CPU
81*9b86b1b4SShaokun Zhangcores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
82*9b86b1b4SShaokun Zhangconfigured in perf command.
83*9b86b1b4SShaokun Zhang  $# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
84*9b86b1b4SShaokun Zhang  hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
85*9b86b1b4SShaokun Zhang
86*9b86b1b4SShaokun Zhang(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
87*9b86b1b4SShaokun Zhangcontains several Compute Clusters (CCLs). The I/O dies are called Super I/O
88*9b86b1b4SShaokun Zhangclusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
89*9b86b1b4SShaokun ZhangSoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
90*9b86b1b4SShaokun ZhangCCL/ICL-ID. For I/O die, the ICL-ID is followed by:
91*9b86b1b4SShaokun Zhang5'b00000: I/O_MGMT_ICL;
92*9b86b1b4SShaokun Zhang5'b00001: Network_ICL;
93*9b86b1b4SShaokun Zhang5'b00011: HAC_ICL;
94*9b86b1b4SShaokun Zhang5'b10000: PCIe_ICL;
95*9b86b1b4SShaokun Zhang
96*9b86b1b4SShaokun ZhangUsers could configure IDs to count data come from specific CCL/ICL, by setting
97*9b86b1b4SShaokun Zhangsrcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
98*9b86b1b4SShaokun Zhangtgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not
99*9b86b1b4SShaokun Zhangcheck the bit when matching against the srcid_cmd/tgtid_cmd.
100*9b86b1b4SShaokun Zhang
101*9b86b1b4SShaokun ZhangIf all of these options are disabled, it can works by the default value that
102*9b86b1b4SShaokun Zhangdoesn't distinguish the filter condition and ID information and will return
103*9b86b1b4SShaokun Zhangthe total counter values in the PMU counters.
104*9b86b1b4SShaokun Zhang
10559809fe8SMauro Carvalho ChehabThe current driver does not support sampling. So "perf record" is unsupported.
10659809fe8SMauro Carvalho ChehabAlso attach to a task is unsupported as the events are all uncore.
10759809fe8SMauro Carvalho Chehab
10859809fe8SMauro Carvalho ChehabNote: Please contact the maintainer for a complete list of events supported for
10959809fe8SMauro Carvalho Chehabthe PMU devices in the SoC and its information if needed.
110