1*84481be7SBesar Wicaksono========================================================= 2*84481be7SBesar WicaksonoNVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU) 3*84481be7SBesar Wicaksono========================================================= 4*84481be7SBesar Wicaksono 5*84481be7SBesar WicaksonoThe NVIDIA Tegra SoC includes various system PMUs to measure key performance 6*84481be7SBesar Wicaksonometrics like memory bandwidth, latency, and utilization: 7*84481be7SBesar Wicaksono 8*84481be7SBesar Wicaksono* Scalable Coherency Fabric (SCF) 9*84481be7SBesar Wicaksono* NVLink-C2C0 10*84481be7SBesar Wicaksono* NVLink-C2C1 11*84481be7SBesar Wicaksono* CNVLink 12*84481be7SBesar Wicaksono* PCIE 13*84481be7SBesar Wicaksono 14*84481be7SBesar WicaksonoPMU Driver 15*84481be7SBesar Wicaksono---------- 16*84481be7SBesar Wicaksono 17*84481be7SBesar WicaksonoThe PMUs in this document are based on ARM CoreSight PMU Architecture as 18*84481be7SBesar Wicaksonodescribed in document: ARM IHI 0091. Since this is a standard architecture, the 19*84481be7SBesar WicaksonoPMUs are managed by a common driver "arm-cs-arch-pmu". This driver describes 20*84481be7SBesar Wicaksonothe available events and configuration of each PMU in sysfs. Please see the 21*84481be7SBesar Wicaksonosections below to get the sysfs path of each PMU. Like other uncore PMU drivers, 22*84481be7SBesar Wicaksonothe driver provides "cpumask" sysfs attribute to show the CPU id used to handle 23*84481be7SBesar Wicaksonothe PMU event. There is also "associated_cpus" sysfs attribute, which contains a 24*84481be7SBesar Wicaksonolist of CPUs associated with the PMU instance. 25*84481be7SBesar Wicaksono 26*84481be7SBesar Wicaksono.. _SCF_PMU_Section: 27*84481be7SBesar Wicaksono 28*84481be7SBesar WicaksonoSCF PMU 29*84481be7SBesar Wicaksono------- 30*84481be7SBesar Wicaksono 31*84481be7SBesar WicaksonoThe SCF PMU monitors system level cache events, CPU traffic, and 32*84481be7SBesar Wicaksonostrongly-ordered (SO) PCIE write traffic to local/remote memory. Please see 33*84481be7SBesar Wicaksono:ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about the PMU 34*84481be7SBesar Wicaksonotraffic coverage. 35*84481be7SBesar Wicaksono 36*84481be7SBesar WicaksonoThe events and configuration options of this PMU device are described in sysfs, 37*84481be7SBesar Wicaksonosee /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>. 38*84481be7SBesar Wicaksono 39*84481be7SBesar WicaksonoExample usage: 40*84481be7SBesar Wicaksono 41*84481be7SBesar Wicaksono* Count event id 0x0 in socket 0:: 42*84481be7SBesar Wicaksono 43*84481be7SBesar Wicaksono perf stat -a -e nvidia_scf_pmu_0/event=0x0/ 44*84481be7SBesar Wicaksono 45*84481be7SBesar Wicaksono* Count event id 0x0 in socket 1:: 46*84481be7SBesar Wicaksono 47*84481be7SBesar Wicaksono perf stat -a -e nvidia_scf_pmu_1/event=0x0/ 48*84481be7SBesar Wicaksono 49*84481be7SBesar WicaksonoNVLink-C2C0 PMU 50*84481be7SBesar Wicaksono-------------------- 51*84481be7SBesar Wicaksono 52*84481be7SBesar WicaksonoThe NVLink-C2C0 PMU monitors incoming traffic from a GPU/CPU connected with 53*84481be7SBesar WicaksonoNVLink-C2C (Chip-2-Chip) interconnect. The type of traffic captured by this PMU 54*84481be7SBesar Wicaksonovaries dependent on the chip configuration: 55*84481be7SBesar Wicaksono 56*84481be7SBesar Wicaksono* NVIDIA Grace Hopper Superchip: Hopper GPU is connected with Grace SoC. 57*84481be7SBesar Wicaksono 58*84481be7SBesar Wicaksono In this config, the PMU captures GPU ATS translated or EGM traffic from the GPU. 59*84481be7SBesar Wicaksono 60*84481be7SBesar Wicaksono* NVIDIA Grace CPU Superchip: two Grace CPU SoCs are connected. 61*84481be7SBesar Wicaksono 62*84481be7SBesar Wicaksono In this config, the PMU captures read and relaxed ordered (RO) writes from 63*84481be7SBesar Wicaksono PCIE device of the remote SoC. 64*84481be7SBesar Wicaksono 65*84481be7SBesar WicaksonoPlease see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about 66*84481be7SBesar Wicaksonothe PMU traffic coverage. 67*84481be7SBesar Wicaksono 68*84481be7SBesar WicaksonoThe events and configuration options of this PMU device are described in sysfs, 69*84481be7SBesar Wicaksonosee /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>. 70*84481be7SBesar Wicaksono 71*84481be7SBesar WicaksonoExample usage: 72*84481be7SBesar Wicaksono 73*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU/CPU connected with socket 0:: 74*84481be7SBesar Wicaksono 75*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0/ 76*84481be7SBesar Wicaksono 77*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU/CPU connected with socket 1:: 78*84481be7SBesar Wicaksono 79*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c0_pmu_1/event=0x0/ 80*84481be7SBesar Wicaksono 81*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU/CPU connected with socket 2:: 82*84481be7SBesar Wicaksono 83*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c0_pmu_2/event=0x0/ 84*84481be7SBesar Wicaksono 85*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU/CPU connected with socket 3:: 86*84481be7SBesar Wicaksono 87*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/ 88*84481be7SBesar Wicaksono 89*84481be7SBesar WicaksonoNVLink-C2C1 PMU 90*84481be7SBesar Wicaksono------------------- 91*84481be7SBesar Wicaksono 92*84481be7SBesar WicaksonoThe NVLink-C2C1 PMU monitors incoming traffic from a GPU connected with 93*84481be7SBesar WicaksonoNVLink-C2C (Chip-2-Chip) interconnect. This PMU captures untranslated GPU 94*84481be7SBesar Wicaksonotraffic, in contrast with NvLink-C2C0 PMU that captures ATS translated traffic. 95*84481be7SBesar WicaksonoPlease see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about 96*84481be7SBesar Wicaksonothe PMU traffic coverage. 97*84481be7SBesar Wicaksono 98*84481be7SBesar WicaksonoThe events and configuration options of this PMU device are described in sysfs, 99*84481be7SBesar Wicaksonosee /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>. 100*84481be7SBesar Wicaksono 101*84481be7SBesar WicaksonoExample usage: 102*84481be7SBesar Wicaksono 103*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU connected with socket 0:: 104*84481be7SBesar Wicaksono 105*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0/ 106*84481be7SBesar Wicaksono 107*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU connected with socket 1:: 108*84481be7SBesar Wicaksono 109*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c1_pmu_1/event=0x0/ 110*84481be7SBesar Wicaksono 111*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU connected with socket 2:: 112*84481be7SBesar Wicaksono 113*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c1_pmu_2/event=0x0/ 114*84481be7SBesar Wicaksono 115*84481be7SBesar Wicaksono* Count event id 0x0 from the GPU connected with socket 3:: 116*84481be7SBesar Wicaksono 117*84481be7SBesar Wicaksono perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/ 118*84481be7SBesar Wicaksono 119*84481be7SBesar WicaksonoCNVLink PMU 120*84481be7SBesar Wicaksono--------------- 121*84481be7SBesar Wicaksono 122*84481be7SBesar WicaksonoThe CNVLink PMU monitors traffic from GPU and PCIE device on remote sockets 123*84481be7SBesar Wicaksonoto local memory. For PCIE traffic, this PMU captures read and relaxed ordered 124*84481be7SBesar Wicaksono(RO) write traffic. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` 125*84481be7SBesar Wicaksonofor more info about the PMU traffic coverage. 126*84481be7SBesar Wicaksono 127*84481be7SBesar WicaksonoThe events and configuration options of this PMU device are described in sysfs, 128*84481be7SBesar Wicaksonosee /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>. 129*84481be7SBesar Wicaksono 130*84481be7SBesar WicaksonoEach SoC socket can be connected to one or more sockets via CNVLink. The user can 131*84481be7SBesar Wicaksonouse "rem_socket" bitmap parameter to select the remote socket(s) to monitor. 132*84481be7SBesar WicaksonoEach bit represents the socket number, e.g. "rem_socket=0xE" corresponds to 133*84481be7SBesar Wicaksonosocket 1 to 3. 134*84481be7SBesar Wicaksono/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket 135*84481be7SBesar Wicaksonoshows the valid bits that can be set in the "rem_socket" parameter. 136*84481be7SBesar Wicaksono 137*84481be7SBesar WicaksonoThe PMU can not distinguish the remote traffic initiator, therefore it does not 138*84481be7SBesar Wicaksonoprovide filter to select the traffic source to monitor. It reports combined 139*84481be7SBesar Wicaksonotraffic from remote GPU and PCIE devices. 140*84481be7SBesar Wicaksono 141*84481be7SBesar WicaksonoExample usage: 142*84481be7SBesar Wicaksono 143*84481be7SBesar Wicaksono* Count event id 0x0 for the traffic from remote socket 1, 2, and 3 to socket 0:: 144*84481be7SBesar Wicaksono 145*84481be7SBesar Wicaksono perf stat -a -e nvidia_cnvlink_pmu_0/event=0x0,rem_socket=0xE/ 146*84481be7SBesar Wicaksono 147*84481be7SBesar Wicaksono* Count event id 0x0 for the traffic from remote socket 0, 2, and 3 to socket 1:: 148*84481be7SBesar Wicaksono 149*84481be7SBesar Wicaksono perf stat -a -e nvidia_cnvlink_pmu_1/event=0x0,rem_socket=0xD/ 150*84481be7SBesar Wicaksono 151*84481be7SBesar Wicaksono* Count event id 0x0 for the traffic from remote socket 0, 1, and 3 to socket 2:: 152*84481be7SBesar Wicaksono 153*84481be7SBesar Wicaksono perf stat -a -e nvidia_cnvlink_pmu_2/event=0x0,rem_socket=0xB/ 154*84481be7SBesar Wicaksono 155*84481be7SBesar Wicaksono* Count event id 0x0 for the traffic from remote socket 0, 1, and 2 to socket 3:: 156*84481be7SBesar Wicaksono 157*84481be7SBesar Wicaksono perf stat -a -e nvidia_cnvlink_pmu_3/event=0x0,rem_socket=0x7/ 158*84481be7SBesar Wicaksono 159*84481be7SBesar Wicaksono 160*84481be7SBesar WicaksonoPCIE PMU 161*84481be7SBesar Wicaksono------------ 162*84481be7SBesar Wicaksono 163*84481be7SBesar WicaksonoThe PCIE PMU monitors all read/write traffic from PCIE root ports to 164*84481be7SBesar Wicaksonolocal/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` 165*84481be7SBesar Wicaksonofor more info about the PMU traffic coverage. 166*84481be7SBesar Wicaksono 167*84481be7SBesar WicaksonoThe events and configuration options of this PMU device are described in sysfs, 168*84481be7SBesar Wicaksonosee /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>. 169*84481be7SBesar Wicaksono 170*84481be7SBesar WicaksonoEach SoC socket can support multiple root ports. The user can use 171*84481be7SBesar Wicaksono"root_port" bitmap parameter to select the port(s) to monitor, i.e. 172*84481be7SBesar Wicaksono"root_port=0xF" corresponds to root port 0 to 3. 173*84481be7SBesar Wicaksono/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port 174*84481be7SBesar Wicaksonoshows the valid bits that can be set in the "root_port" parameter. 175*84481be7SBesar Wicaksono 176*84481be7SBesar WicaksonoExample usage: 177*84481be7SBesar Wicaksono 178*84481be7SBesar Wicaksono* Count event id 0x0 from root port 0 and 1 of socket 0:: 179*84481be7SBesar Wicaksono 180*84481be7SBesar Wicaksono perf stat -a -e nvidia_pcie_pmu_0/event=0x0,root_port=0x3/ 181*84481be7SBesar Wicaksono 182*84481be7SBesar Wicaksono* Count event id 0x0 from root port 0 and 1 of socket 1:: 183*84481be7SBesar Wicaksono 184*84481be7SBesar Wicaksono perf stat -a -e nvidia_pcie_pmu_1/event=0x0,root_port=0x3/ 185*84481be7SBesar Wicaksono 186*84481be7SBesar Wicaksono.. _NVIDIA_Uncore_PMU_Traffic_Coverage_Section: 187*84481be7SBesar Wicaksono 188*84481be7SBesar WicaksonoTraffic Coverage 189*84481be7SBesar Wicaksono---------------- 190*84481be7SBesar Wicaksono 191*84481be7SBesar WicaksonoThe PMU traffic coverage may vary dependent on the chip configuration: 192*84481be7SBesar Wicaksono 193*84481be7SBesar Wicaksono* **NVIDIA Grace Hopper Superchip**: Hopper GPU is connected with Grace SoC. 194*84481be7SBesar Wicaksono 195*84481be7SBesar Wicaksono Example configuration with two Grace SoCs:: 196*84481be7SBesar Wicaksono 197*84481be7SBesar Wicaksono ********************************* ********************************* 198*84481be7SBesar Wicaksono * SOCKET-A * * SOCKET-B * 199*84481be7SBesar Wicaksono * * * * 200*84481be7SBesar Wicaksono * :::::::: * * :::::::: * 201*84481be7SBesar Wicaksono * : PCIE : * * : PCIE : * 202*84481be7SBesar Wicaksono * :::::::: * * :::::::: * 203*84481be7SBesar Wicaksono * | * * | * 204*84481be7SBesar Wicaksono * | * * | * 205*84481be7SBesar Wicaksono * ::::::: ::::::::: * * ::::::::: ::::::: * 206*84481be7SBesar Wicaksono * : : : : * * : : : : * 207*84481be7SBesar Wicaksono * : GPU :<--NVLink-->: Grace :<---CNVLink--->: Grace :<--NVLink-->: GPU : * 208*84481be7SBesar Wicaksono * : : C2C : SoC : * * : SoC : C2C : : * 209*84481be7SBesar Wicaksono * ::::::: ::::::::: * * ::::::::: ::::::: * 210*84481be7SBesar Wicaksono * | | * * | | * 211*84481be7SBesar Wicaksono * | | * * | | * 212*84481be7SBesar Wicaksono * &&&&&&&& &&&&&&&& * * &&&&&&&& &&&&&&&& * 213*84481be7SBesar Wicaksono * & GMEM & & CMEM & * * & CMEM & & GMEM & * 214*84481be7SBesar Wicaksono * &&&&&&&& &&&&&&&& * * &&&&&&&& &&&&&&&& * 215*84481be7SBesar Wicaksono * * * * 216*84481be7SBesar Wicaksono ********************************* ********************************* 217*84481be7SBesar Wicaksono 218*84481be7SBesar Wicaksono GMEM = GPU Memory (e.g. HBM) 219*84481be7SBesar Wicaksono CMEM = CPU Memory (e.g. LPDDR5X) 220*84481be7SBesar Wicaksono 221*84481be7SBesar Wicaksono | 222*84481be7SBesar Wicaksono | Following table contains traffic coverage of Grace SoC PMU in socket-A: 223*84481be7SBesar Wicaksono 224*84481be7SBesar Wicaksono :: 225*84481be7SBesar Wicaksono 226*84481be7SBesar Wicaksono +--------------+-------+-----------+-----------+-----+----------+----------+ 227*84481be7SBesar Wicaksono | | Source | 228*84481be7SBesar Wicaksono + +-------+-----------+-----------+-----+----------+----------+ 229*84481be7SBesar Wicaksono | Destination | |GPU ATS |GPU Not-ATS| | Socket-B | Socket-B | 230*84481be7SBesar Wicaksono | |PCI R/W|Translated,|Translated | CPU | CPU/PCIE1| GPU/PCIE2| 231*84481be7SBesar Wicaksono | | |EGM | | | | | 232*84481be7SBesar Wicaksono +==============+=======+===========+===========+=====+==========+==========+ 233*84481be7SBesar Wicaksono | Local | PCIE |NVLink-C2C0|NVLink-C2C1| SCF | SCF PMU | CNVLink | 234*84481be7SBesar Wicaksono | SYSRAM/CMEM | PMU |PMU |PMU | PMU | | PMU | 235*84481be7SBesar Wicaksono +--------------+-------+-----------+-----------+-----+----------+----------+ 236*84481be7SBesar Wicaksono | Local GMEM | PCIE | N/A |NVLink-C2C1| SCF | SCF PMU | CNVLink | 237*84481be7SBesar Wicaksono | | PMU | |PMU | PMU | | PMU | 238*84481be7SBesar Wicaksono +--------------+-------+-----------+-----------+-----+----------+----------+ 239*84481be7SBesar Wicaksono | Remote | PCIE |NVLink-C2C0|NVLink-C2C1| SCF | | | 240*84481be7SBesar Wicaksono | SYSRAM/CMEM | PMU |PMU |PMU | PMU | N/A | N/A | 241*84481be7SBesar Wicaksono | over CNVLink | | | | | | | 242*84481be7SBesar Wicaksono +--------------+-------+-----------+-----------+-----+----------+----------+ 243*84481be7SBesar Wicaksono | Remote GMEM | PCIE |NVLink-C2C0|NVLink-C2C1| SCF | | | 244*84481be7SBesar Wicaksono | over CNVLink | PMU |PMU |PMU | PMU | N/A | N/A | 245*84481be7SBesar Wicaksono +--------------+-------+-----------+-----------+-----+----------+----------+ 246*84481be7SBesar Wicaksono 247*84481be7SBesar Wicaksono PCIE1 traffic represents strongly ordered (SO) writes. 248*84481be7SBesar Wicaksono PCIE2 traffic represents reads and relaxed ordered (RO) writes. 249*84481be7SBesar Wicaksono 250*84481be7SBesar Wicaksono* **NVIDIA Grace CPU Superchip**: two Grace CPU SoCs are connected. 251*84481be7SBesar Wicaksono 252*84481be7SBesar Wicaksono Example configuration with two Grace SoCs:: 253*84481be7SBesar Wicaksono 254*84481be7SBesar Wicaksono ******************* ******************* 255*84481be7SBesar Wicaksono * SOCKET-A * * SOCKET-B * 256*84481be7SBesar Wicaksono * * * * 257*84481be7SBesar Wicaksono * :::::::: * * :::::::: * 258*84481be7SBesar Wicaksono * : PCIE : * * : PCIE : * 259*84481be7SBesar Wicaksono * :::::::: * * :::::::: * 260*84481be7SBesar Wicaksono * | * * | * 261*84481be7SBesar Wicaksono * | * * | * 262*84481be7SBesar Wicaksono * ::::::::: * * ::::::::: * 263*84481be7SBesar Wicaksono * : : * * : : * 264*84481be7SBesar Wicaksono * : Grace :<--------NVLink------->: Grace : * 265*84481be7SBesar Wicaksono * : SoC : * C2C * : SoC : * 266*84481be7SBesar Wicaksono * ::::::::: * * ::::::::: * 267*84481be7SBesar Wicaksono * | * * | * 268*84481be7SBesar Wicaksono * | * * | * 269*84481be7SBesar Wicaksono * &&&&&&&& * * &&&&&&&& * 270*84481be7SBesar Wicaksono * & CMEM & * * & CMEM & * 271*84481be7SBesar Wicaksono * &&&&&&&& * * &&&&&&&& * 272*84481be7SBesar Wicaksono * * * * 273*84481be7SBesar Wicaksono ******************* ******************* 274*84481be7SBesar Wicaksono 275*84481be7SBesar Wicaksono GMEM = GPU Memory (e.g. HBM) 276*84481be7SBesar Wicaksono CMEM = CPU Memory (e.g. LPDDR5X) 277*84481be7SBesar Wicaksono 278*84481be7SBesar Wicaksono | 279*84481be7SBesar Wicaksono | Following table contains traffic coverage of Grace SoC PMU in socket-A: 280*84481be7SBesar Wicaksono 281*84481be7SBesar Wicaksono :: 282*84481be7SBesar Wicaksono 283*84481be7SBesar Wicaksono +-----------------+-----------+---------+----------+-------------+ 284*84481be7SBesar Wicaksono | | Source | 285*84481be7SBesar Wicaksono + +-----------+---------+----------+-------------+ 286*84481be7SBesar Wicaksono | Destination | | | Socket-B | Socket-B | 287*84481be7SBesar Wicaksono | | PCI R/W | CPU | CPU/PCIE1| PCIE2 | 288*84481be7SBesar Wicaksono | | | | | | 289*84481be7SBesar Wicaksono +=================+===========+=========+==========+=============+ 290*84481be7SBesar Wicaksono | Local | PCIE PMU | SCF PMU | SCF PMU | NVLink-C2C0 | 291*84481be7SBesar Wicaksono | SYSRAM/CMEM | | | | PMU | 292*84481be7SBesar Wicaksono +-----------------+-----------+---------+----------+-------------+ 293*84481be7SBesar Wicaksono | Remote | | | | | 294*84481be7SBesar Wicaksono | SYSRAM/CMEM | PCIE PMU | SCF PMU | N/A | N/A | 295*84481be7SBesar Wicaksono | over NVLink-C2C | | | | | 296*84481be7SBesar Wicaksono +-----------------+-----------+---------+----------+-------------+ 297*84481be7SBesar Wicaksono 298*84481be7SBesar Wicaksono PCIE1 traffic represents strongly ordered (SO) writes. 299*84481be7SBesar Wicaksono PCIE2 traffic represents reads and relaxed ordered (RO) writes. 300