History log of /openbmc/dbus-sensors/src/nvidia-gpu/NvidiaDeviceDiscovery.cpp (Results 1 – 3 of 3)
Revision Date Author Comments
# 0ad3a7e8 26-Jun-2025 Deepak Kodihalli <deepak.kodihalli.83@gmail.com>

nvidia-gpu: fix crash due to optional EM config

Sensor poll rate EM config is optional. Handle the case where the same
is missing by assigning a default poll rate.

See https://gerrit.openbmc.org/c/

nvidia-gpu: fix crash due to optional EM config

Sensor poll rate EM config is optional. Handle the case where the same
is missing by assigning a default poll rate.

See https://gerrit.openbmc.org/c/openbmc/entity-manager/+/80579/.

Change-Id: I1ceac45ba8adae33affe0cdd19513484179b1e4c
Signed-off-by: Deepak Kodihalli <deepak.kodihalli.83@gmail.com>

show more ...


# 8951c87e 25-Jun-2025 Harshit Aghera <haghera@nvidia.com>

nvidia-gpu: add SMA Temperature Sensor

Add support for device type SMA (System Management Agent) and its
temperature sensor. It is typically an MCU device.

Tested: Build an image for gb200nvl-obmc

nvidia-gpu: add SMA Temperature Sensor

Add support for device type SMA (System Management Agent) and its
temperature sensor. It is typically an MCU device.

Tested: Build an image for gb200nvl-obmc machine with the following
patches cherry picked. This patches are needed to enable the mctp stack.

https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422

```
$ curl -s -k -u 'root:0penBmc' https://10.137.203.193/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_SMA_255_TEMP_0
{
"@odata.id": "/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_SMA_255_TEMP_0",
"@odata.type": "#Sensor.v1_2_0.Sensor",
"Id": "temperature_NVIDIA_GB200_GPU_SMA_255_TEMP_0",
"Name": "NVIDIA GB200 GPU SMA 255 TEMP 0",
"Reading": 34.0,
"ReadingRangeMax": 127.0,
"ReadingRangeMin": -128.0,
"ReadingType": "Temperature",
"ReadingUnits": "Cel",
"Status": {
"Health": "OK",
"State": "Enabled"
}
}%
```

Change-Id: I560864758036a5b6ea6c1745145736c7bfa0a1c5
Signed-off-by: Harshit Aghera <haghera@nvidia.com>

show more ...


# 4ecdfaaa 22-May-2025 Harshit Aghera <haghera@nvidia.com>

nvidia-gpu: introduce notion of a device

Perform device discovery tasks only once per device to prepare for
introducing additional gpu sensors.

In the current implementation, sensor updates and dev

nvidia-gpu: introduce notion of a device

Perform device discovery tasks only once per device to prepare for
introducing additional gpu sensors.

In the current implementation, sensor updates and device discovery via
MCTP are managed within a single class for simplicity. However, since a
GPU device typically includes multiple sensors, performing device
discovery for each individual sensor is inefficient. Instead, it would
be more effective to execute device discovery once per device.

Tested: Build an image for gb200nvl-obmc machine with the following
patches cherry picked. This patches are needed to enable the mctp stack.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422

```
$ curl -k -u 'root:0penBmc' https://10.137.203.137/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_0_TEMP_0
{
"@odata.id": "/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_0_TEMP_0",
"@odata.type": "#Sensor.v1_2_0.Sensor",
"Id": "temperature_NVIDIA_GB200_GPU_0_TEMP_0",
"Name": "NVIDIA GB200 GPU 0 TEMP 0",
"Reading": 37.6875,
"ReadingRangeMax": 127.0,
"ReadingRangeMin": -128.0,
"ReadingType": "Temperature",
"ReadingUnits": "Cel",
"Status": {
"Health": "OK",
"State": "Enabled"
}
}%
```

Change-Id: Ie3dcd43caa031b4aaa61d8be3f5d71aefd53bc9a
Signed-off-by: Harshit Aghera <haghera@nvidia.com>

show more ...