xref: /openbmc/linux/Documentation/admin-guide/perf/alibaba_pmu.rst (revision a6f92909d6bb59eafa004178983850a1b739e304)
1*a6f92909SShuai Xue=============================================================
2*a6f92909SShuai XueAlibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU)
3*a6f92909SShuai Xue=============================================================
4*a6f92909SShuai Xue
5*a6f92909SShuai XueThe Yitian 710, custom-built by Alibaba Group's chip development business,
6*a6f92909SShuai XueT-Head, implements uncore PMU for performance and functional debugging to
7*a6f92909SShuai Xuefacilitate system maintenance.
8*a6f92909SShuai Xue
9*a6f92909SShuai XueDDR Sub-System Driveway (DRW) PMU Driver
10*a6f92909SShuai Xue=========================================
11*a6f92909SShuai Xue
12*a6f92909SShuai XueYitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel
13*a6f92909SShuai Xueis independent of others to service system memory requests. And one DDR5
14*a6f92909SShuai Xuechannel is split into two independent sub-channels. The DDR Sub-System Driveway
15*a6f92909SShuai Xueimplements separate PMUs for each sub-channel to monitor various performance
16*a6f92909SShuai Xuemetrics.
17*a6f92909SShuai Xue
18*a6f92909SShuai XueThe Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf.
19*a6f92909SShuai XueFor example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two
20*a6f92909SShuai Xuesub-channels of the same channel in die 0. And the PMU device of die 1 is
21*a6f92909SShuai Xueprefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000.
22*a6f92909SShuai Xue
23*a6f92909SShuai XueEach sub-channel has 36 PMU counters in total, which is classified into
24*a6f92909SShuai Xuefour groups:
25*a6f92909SShuai Xue
26*a6f92909SShuai Xue- Group 0: PMU Cycle Counter. This group has one pair of counters
27*a6f92909SShuai Xue  pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count
28*a6f92909SShuai Xue  based on DDRC core clock.
29*a6f92909SShuai Xue
30*a6f92909SShuai Xue- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used
31*a6f92909SShuai Xue  to count the total access number of either the eight bank groups in a
32*a6f92909SShuai Xue  selected rank, or four ranks separately in the first 4 counters. The base
33*a6f92909SShuai Xue  transfer unit is 64B.
34*a6f92909SShuai Xue
35*a6f92909SShuai Xue- Group 2: PMU Retry Counters. This group has 10 counters, that intend to
36*a6f92909SShuai Xue  count the total retry number of each type of uncorrectable error.
37*a6f92909SShuai Xue
38*a6f92909SShuai Xue- Group 3: PMU Common Counters. This group has 16 counters, that are used
39*a6f92909SShuai Xue  to count the common events.
40*a6f92909SShuai Xue
41*a6f92909SShuai XueFor now, the Driveway PMU driver only uses counters in group 0 and group 3.
42*a6f92909SShuai Xue
43*a6f92909SShuai XueThe DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution
44*a6f92909SShuai Xuefor connecting an SoC application bus to DDR memory devices. The DDRCTL
45*a6f92909SShuai Xuereceives transactions Host Interface (HIF) which is custom-defined by Synopsys.
46*a6f92909SShuai XueThese transactions are queued internally and scheduled for access while
47*a6f92909SShuai Xuesatisfying the SDRAM protocol timing requirements, transaction priorities, and
48*a6f92909SShuai Xuedependencies between the transactions. The DDRCTL in turn issues commands on
49*a6f92909SShuai Xuethe DDR PHY Interface (DFI) to the PHY module, which launches and captures data
50*a6f92909SShuai Xueto and from the SDRAM. The driveway PMUs have hardware logic to gather
51*a6f92909SShuai Xuestatistics and performance logging signals on HIF, DFI, etc.
52*a6f92909SShuai Xue
53*a6f92909SShuai XueBy counting the READ, WRITE and RMW commands sent to the DDRC through the HIF
54*a6f92909SShuai Xueinterface, we could calculate the bandwidth. Example usage of counting memory
55*a6f92909SShuai Xuedata bandwidth::
56*a6f92909SShuai Xue
57*a6f92909SShuai Xue  perf stat \
58*a6f92909SShuai Xue    -e ali_drw_21000/hif_wr/ \
59*a6f92909SShuai Xue    -e ali_drw_21000/hif_rd/ \
60*a6f92909SShuai Xue    -e ali_drw_21000/hif_rmw/ \
61*a6f92909SShuai Xue    -e ali_drw_21000/cycle/ \
62*a6f92909SShuai Xue    -e ali_drw_21080/hif_wr/ \
63*a6f92909SShuai Xue    -e ali_drw_21080/hif_rd/ \
64*a6f92909SShuai Xue    -e ali_drw_21080/hif_rmw/ \
65*a6f92909SShuai Xue    -e ali_drw_21080/cycle/ \
66*a6f92909SShuai Xue    -e ali_drw_23000/hif_wr/ \
67*a6f92909SShuai Xue    -e ali_drw_23000/hif_rd/ \
68*a6f92909SShuai Xue    -e ali_drw_23000/hif_rmw/ \
69*a6f92909SShuai Xue    -e ali_drw_23000/cycle/ \
70*a6f92909SShuai Xue    -e ali_drw_23080/hif_wr/ \
71*a6f92909SShuai Xue    -e ali_drw_23080/hif_rd/ \
72*a6f92909SShuai Xue    -e ali_drw_23080/hif_rmw/ \
73*a6f92909SShuai Xue    -e ali_drw_23080/cycle/ \
74*a6f92909SShuai Xue    -e ali_drw_25000/hif_wr/ \
75*a6f92909SShuai Xue    -e ali_drw_25000/hif_rd/ \
76*a6f92909SShuai Xue    -e ali_drw_25000/hif_rmw/ \
77*a6f92909SShuai Xue    -e ali_drw_25000/cycle/ \
78*a6f92909SShuai Xue    -e ali_drw_25080/hif_wr/ \
79*a6f92909SShuai Xue    -e ali_drw_25080/hif_rd/ \
80*a6f92909SShuai Xue    -e ali_drw_25080/hif_rmw/ \
81*a6f92909SShuai Xue    -e ali_drw_25080/cycle/ \
82*a6f92909SShuai Xue    -e ali_drw_27000/hif_wr/ \
83*a6f92909SShuai Xue    -e ali_drw_27000/hif_rd/ \
84*a6f92909SShuai Xue    -e ali_drw_27000/hif_rmw/ \
85*a6f92909SShuai Xue    -e ali_drw_27000/cycle/ \
86*a6f92909SShuai Xue    -e ali_drw_27080/hif_wr/ \
87*a6f92909SShuai Xue    -e ali_drw_27080/hif_rd/ \
88*a6f92909SShuai Xue    -e ali_drw_27080/hif_rmw/ \
89*a6f92909SShuai Xue    -e ali_drw_27080/cycle/ -- sleep 10
90*a6f92909SShuai Xue
91*a6f92909SShuai XueThe average DRAM bandwidth can be calculated as follows:
92*a6f92909SShuai Xue
93*a6f92909SShuai Xue- Read Bandwidth =  perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
94*a6f92909SShuai Xue- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
95*a6f92909SShuai Xue
96*a6f92909SShuai XueHere, DDRC_WIDTH = 64 bytes.
97*a6f92909SShuai Xue
98*a6f92909SShuai XueThe current driver does not support sampling. So "perf record" is
99*a6f92909SShuai Xueunsupported.  Also attach to a task is unsupported as the events are all
100*a6f92909SShuai Xueuncore.
101