1a6f92909SShuai Xue=============================================================
2a6f92909SShuai XueAlibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU)
3a6f92909SShuai Xue=============================================================
4a6f92909SShuai Xue
5a6f92909SShuai XueThe Yitian 710, custom-built by Alibaba Group's chip development business,
6a6f92909SShuai XueT-Head, implements uncore PMU for performance and functional debugging to
7a6f92909SShuai Xuefacilitate system maintenance.
8a6f92909SShuai Xue
9a6f92909SShuai XueDDR Sub-System Driveway (DRW) PMU Driver
10a6f92909SShuai Xue=========================================
11a6f92909SShuai Xue
12a6f92909SShuai XueYitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel
13a6f92909SShuai Xueis independent of others to service system memory requests. And one DDR5
14a6f92909SShuai Xuechannel is split into two independent sub-channels. The DDR Sub-System Driveway
15a6f92909SShuai Xueimplements separate PMUs for each sub-channel to monitor various performance
16a6f92909SShuai Xuemetrics.
17a6f92909SShuai Xue
18a6f92909SShuai XueThe Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf.
19a6f92909SShuai XueFor example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two
20a6f92909SShuai Xuesub-channels of the same channel in die 0. And the PMU device of die 1 is
21a6f92909SShuai Xueprefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000.
22a6f92909SShuai Xue
23a6f92909SShuai XueEach sub-channel has 36 PMU counters in total, which is classified into
24a6f92909SShuai Xuefour groups:
25a6f92909SShuai Xue
26a6f92909SShuai Xue- Group 0: PMU Cycle Counter. This group has one pair of counters
27a6f92909SShuai Xue  pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count
28a6f92909SShuai Xue  based on DDRC core clock.
29a6f92909SShuai Xue
30a6f92909SShuai Xue- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used
31a6f92909SShuai Xue  to count the total access number of either the eight bank groups in a
32a6f92909SShuai Xue  selected rank, or four ranks separately in the first 4 counters. The base
33a6f92909SShuai Xue  transfer unit is 64B.
34a6f92909SShuai Xue
35a6f92909SShuai Xue- Group 2: PMU Retry Counters. This group has 10 counters, that intend to
36a6f92909SShuai Xue  count the total retry number of each type of uncorrectable error.
37a6f92909SShuai Xue
38a6f92909SShuai Xue- Group 3: PMU Common Counters. This group has 16 counters, that are used
39a6f92909SShuai Xue  to count the common events.
40a6f92909SShuai Xue
41a6f92909SShuai XueFor now, the Driveway PMU driver only uses counters in group 0 and group 3.
42a6f92909SShuai Xue
43a6f92909SShuai XueThe DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution
44a6f92909SShuai Xuefor connecting an SoC application bus to DDR memory devices. The DDRCTL
45a6f92909SShuai Xuereceives transactions Host Interface (HIF) which is custom-defined by Synopsys.
46a6f92909SShuai XueThese transactions are queued internally and scheduled for access while
47a6f92909SShuai Xuesatisfying the SDRAM protocol timing requirements, transaction priorities, and
48a6f92909SShuai Xuedependencies between the transactions. The DDRCTL in turn issues commands on
49a6f92909SShuai Xuethe DDR PHY Interface (DFI) to the PHY module, which launches and captures data
50a6f92909SShuai Xueto and from the SDRAM. The driveway PMUs have hardware logic to gather
51a6f92909SShuai Xuestatistics and performance logging signals on HIF, DFI, etc.
52a6f92909SShuai Xue
53a6f92909SShuai XueBy counting the READ, WRITE and RMW commands sent to the DDRC through the HIF
54a6f92909SShuai Xueinterface, we could calculate the bandwidth. Example usage of counting memory
55a6f92909SShuai Xuedata bandwidth::
56a6f92909SShuai Xue
57a6f92909SShuai Xue  perf stat \
58a6f92909SShuai Xue    -e ali_drw_21000/hif_wr/ \
59a6f92909SShuai Xue    -e ali_drw_21000/hif_rd/ \
60a6f92909SShuai Xue    -e ali_drw_21000/hif_rmw/ \
61a6f92909SShuai Xue    -e ali_drw_21000/cycle/ \
62a6f92909SShuai Xue    -e ali_drw_21080/hif_wr/ \
63a6f92909SShuai Xue    -e ali_drw_21080/hif_rd/ \
64a6f92909SShuai Xue    -e ali_drw_21080/hif_rmw/ \
65a6f92909SShuai Xue    -e ali_drw_21080/cycle/ \
66a6f92909SShuai Xue    -e ali_drw_23000/hif_wr/ \
67a6f92909SShuai Xue    -e ali_drw_23000/hif_rd/ \
68a6f92909SShuai Xue    -e ali_drw_23000/hif_rmw/ \
69a6f92909SShuai Xue    -e ali_drw_23000/cycle/ \
70a6f92909SShuai Xue    -e ali_drw_23080/hif_wr/ \
71a6f92909SShuai Xue    -e ali_drw_23080/hif_rd/ \
72a6f92909SShuai Xue    -e ali_drw_23080/hif_rmw/ \
73a6f92909SShuai Xue    -e ali_drw_23080/cycle/ \
74a6f92909SShuai Xue    -e ali_drw_25000/hif_wr/ \
75a6f92909SShuai Xue    -e ali_drw_25000/hif_rd/ \
76a6f92909SShuai Xue    -e ali_drw_25000/hif_rmw/ \
77a6f92909SShuai Xue    -e ali_drw_25000/cycle/ \
78a6f92909SShuai Xue    -e ali_drw_25080/hif_wr/ \
79a6f92909SShuai Xue    -e ali_drw_25080/hif_rd/ \
80a6f92909SShuai Xue    -e ali_drw_25080/hif_rmw/ \
81a6f92909SShuai Xue    -e ali_drw_25080/cycle/ \
82a6f92909SShuai Xue    -e ali_drw_27000/hif_wr/ \
83a6f92909SShuai Xue    -e ali_drw_27000/hif_rd/ \
84a6f92909SShuai Xue    -e ali_drw_27000/hif_rmw/ \
85a6f92909SShuai Xue    -e ali_drw_27000/cycle/ \
86a6f92909SShuai Xue    -e ali_drw_27080/hif_wr/ \
87a6f92909SShuai Xue    -e ali_drw_27080/hif_rd/ \
88a6f92909SShuai Xue    -e ali_drw_27080/hif_rmw/ \
89a6f92909SShuai Xue    -e ali_drw_27080/cycle/ -- sleep 10
90a6f92909SShuai Xue
91*f849ce6bSJing ZhangExample usage of counting all memory read/write bandwidth by metric::
92*f849ce6bSJing Zhang
93*f849ce6bSJing Zhang  perf stat -M ddr_read_bandwidth.all -- sleep 10
94*f849ce6bSJing Zhang  perf stat -M ddr_write_bandwidth.all -- sleep 10
95*f849ce6bSJing Zhang
96a6f92909SShuai XueThe average DRAM bandwidth can be calculated as follows:
97a6f92909SShuai Xue
98a6f92909SShuai Xue- Read Bandwidth =  perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
99a6f92909SShuai Xue- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
100a6f92909SShuai Xue
101a6f92909SShuai XueHere, DDRC_WIDTH = 64 bytes.
102a6f92909SShuai Xue
103a6f92909SShuai XueThe current driver does not support sampling. So "perf record" is
104a6f92909SShuai Xueunsupported.  Also attach to a task is unsupported as the events are all
105a6f92909SShuai Xueuncore.
106