History log of /openbmc/dbus-sensors/src/nvidia-gpu/NvidiaGpuSensorMain.cpp (Results 1 – 3 of 3)
Revision Date Author Comments
# 560e6af7 21-Apr-2025 Harshit Aghera <haghera@nvidia.com>

nvidia-gpu: add support for communication to the endpoint

The commit uses MCTP VDM protocol to read temperature sensor value from
the gpu.

The MCTP VDM protocol is an extension of the OCP Accelerat

nvidia-gpu: add support for communication to the endpoint

The commit uses MCTP VDM protocol to read temperature sensor value from
the gpu.

The MCTP VDM protocol is an extension of the OCP Accelerator Management
Interface specification. [1]

Tested: Build an image for gb200nvl-obmc machine with the following
patches cherry picked. This patches are needed to enable the mctp stack.

https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422

Restart the nvidiagpusensor service.
```
root@gb200nvl-obmc:~# systemctl start xyz.openbmc_project.nvidiagpusensor.service
```

The app is detecting entity-manager configuration on gb200nvl-obmc
machine. The app is also able to detect all the endpoints from the mctp
service dbus tree. The app is reading temperature sensor value from gpu
correctly and the temperature sensor is also present on redfish.

```
$ curl -k -u 'root:0penBmc' https://10.137.203.137/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU
{
"@odata.id": "/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU",
"@odata.type": "#Sensor.v1_2_0.Sensor",
"Id": "temperature_NVIDIA_GB200_GPU",
"Name": "NVIDIA GB200 GPU",
"Reading": 36.4375,
"ReadingRangeMax": 127.0,
"ReadingRangeMin": -128.0,
"ReadingType": "Temperature",
"ReadingUnits": "Cel",
"Status": {
"Health": "OK",
"State": "Enabled"
}
}%

root@gb200nvl-obmc:~# busctl tree xyz.openbmc_project.GpuSensor
└─ /xyz
└─ /xyz/openbmc_project
└─ /xyz/openbmc_project/sensors
└─ /xyz/openbmc_project/sensors/temperature
└─ /xyz/openbmc_project/sensors/temperature/NVIDIA_GB200_GPU

root@gb200nvl-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/sensors/temperature/NVIDIA_GB200_GPU
NAME TYPE SIGNATURE RESULT/VALUE FLAGS
org.freedesktop.DBus.Introspectable interface - - -
.Introspect method - s -
org.freedesktop.DBus.Peer interface - - -
.GetMachineId method - s -
.Ping method - - -
org.freedesktop.DBus.Properties interface - - -
.Get method ss v -
.GetAll method s a{sv} -
.Set method ssv - -
.PropertiesChanged signal sa{sv}as - -
xyz.openbmc_project.Association.Definitions interface - - -
.Associations property a(sss) 1 "chassis" "all_sensors" "/xyz/openbmc… emits-change
xyz.openbmc_project.Sensor.Value interface - - -
.MaxValue property d 127 emits-change
.MinValue property d -128 emits-change
.Unit property s "xyz.openbmc_project.Sensor.Value.Unit.… emits-change
.Value property d 36.3125 emits-change writable
xyz.openbmc_project.Sensor.ValueMutability interface - - -
.Mutable property b true emits-change
xyz.openbmc_project.State.Decorator.Availability interface - - -
.Available property b true emits-change writable
xyz.openbmc_project.State.Decorator.OperationalStatus interface - - -
.Functional property b true emits-change
```

[1] https://www.opencompute.org/documents/ocp-gpu-accelerator-management-interfaces-v1-pdf

Change-Id: Ied938b9e5c19751ee283b4b948e16c905c78fb48
Signed-off-by: Harshit Aghera <haghera@nvidia.com>

show more ...


# d837b56c 21-Apr-2025 Harshit Aghera <haghera@nvidia.com>

nvidia-gpu: add entity-manager support

The commit add support for reading of the entity-manager configurations
to the gpu dbus sensor app.

Tested.

Build an image for gb200nvl-obmc machine with the

nvidia-gpu: add entity-manager support

The commit add support for reading of the entity-manager configurations
to the gpu dbus sensor app.

Tested.

Build an image for gb200nvl-obmc machine with the following patches
cherry picked. This patches are needed to enable the mctp stack.

https://gerrit.openbmc.org/c/openbmc/openbmc/+/79312
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422

Copy the gpusensor app and run it.
```
root@gb200nvl-obmc:~# ./nvidiagpusensor
```

The app is detecting entity-manager configuration on gb200nvl-obmc
machine. The app is also able to detect all the endpoints from the mctp
service dbus tree.

Change-Id: I05a0597964bcc0c135484fed714b6f677adc5891
Signed-off-by: Harshit Aghera <haghera@nvidia.com>

show more ...


# 82d4a623 21-Apr-2025 Harshit Aghera <haghera@nvidia.com>

nvidia-gpu: add gpu sensor app

The commit adds a dbus sensor app that uses MCTP VDM protocol to read
temperature sensor value from the gpu.

The MCTP VDM protocol is an extension of the OCP specific

nvidia-gpu: add gpu sensor app

The commit adds a dbus sensor app that uses MCTP VDM protocol to read
temperature sensor value from the gpu.

The MCTP VDM protocol is an extension of the OCP specification -
'''
https://www.opencompute.org/documents/ocp-gpu-accelerator-management-interfaces-v1-pdf
'''

Tested.

Copy the gpusensor app on gb200nvl-obmc machine and run it.
```
root@gb200nvl-obmc:~# ./nvidiagpusensor
```

The app runs without errors.
```
root@gb200nvl-obmc:~# busctl tree xyz.openbmc_project.GpuSensor
└─ /xyz
└─ /xyz/openbmc_project
└─ /xyz/openbmc_project/sensors
```

Change-Id: Iee7376a9116489052c690f2e3a1ca8d0f29564dd
Signed-off-by: Harshit Aghera <haghera@nvidia.com>

show more ...