| b139302c | 08-Jan-2026 |
Eric Liu <liuer@nvidia.com> |
nvidia-gpu: add BoostClockFrequency property
Implement BoostClockFrequency property for NVIDIA GPU inventory to expose the default boost clock frequency of GPU accelerators.
The property is added t
nvidia-gpu: add BoostClockFrequency property
Implement BoostClockFrequency property for NVIDIA GPU inventory to expose the default boost clock frequency of GPU accelerators.
The property is added to xyz.openbmc_project.Inventory.Item .Accelerator interface, utilizing the existing MCTP VDM Property ID 21 (DEFAULT_BOOST_CLOCKS) to query the GPU hardware over MCTP and populate the property value.
Changes: - src/nvidia-gpu/NvidiaGpuMctpVdm.hpp: Add uint64_t to InventoryValue variant to support numeric clock speed values. - src/nvidia-gpu/NvidiaGpuMctpVdm.cpp: Add DEFAULT_BOOST_CLOCKS case to decodeInventoryData to parse uint64_t clock speed from MCTP response payload. - src/nvidia-gpu/Inventory.cpp: Register BoostClockFrequency property on Accelerator interface, add DEFAULT_BOOST_CLOCKS to properties query map, and handle uint64_t response in handleInventoryPropertyResponse.
Tested: Build an image for nvl32-obmc machine with the following patch cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85763 https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/85080
Verified via busctl that BoostClockFrequency property appears under xyz.openbmc_project.Inventory.Item.Accelerator interface for GPU devices and contains the correct boost clock value (e.g., 2430 MHz). Confirmed successful MCTP query and property update through nvidiagpusensor service logs.
Change-Id: I3d7410e1b1a455a81263c89f63ac1c6338eeefe1 Signed-off-by: Eric Liu <liuer@nvidia.com>
show more ...
|
| 4c0a0b45 | 29-Dec-2025 |
Ender Hsieh <andhsieh@nvidia.com> |
nvidia-gpu: implement PhysicalContext interface
This change implements the xyz.openbmc_project.Common.PhysicalContext interface for NVIDIA GPU sensors (Energy, Power, Temperature, and Voltage). This
nvidia-gpu: implement PhysicalContext interface
This change implements the xyz.openbmc_project.Common.PhysicalContext interface for NVIDIA GPU sensors (Energy, Power, Temperature, and Voltage). This allows sensors to expose their hardware context information to external interfaces like Redfish.
Instead of hardcoding the PhysicalContext value, this implementation uses a device type enum (gpu::DeviceIdentification) to determine the appropriate PhysicalContext. A helper function maps device types to their corresponding D-Bus PhysicalContext values, ensuring proper separation of concerns.
For GPU devices, the Type property is set to 'GPU'. For other device types (SMA, PCIe), no PhysicalContext interface is created, keeping their D-Bus representation clean.
This implementation follows the interface definition introduced in: https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/86504
Changes: - src/nvidia-gpu/NvidiaSensorUtils.hpp: New helper file containing deviceTypeToPhysicalContext() function that maps DeviceIdentification enum to D-Bus PhysicalContext paths. - src/nvidia-gpu/*.hpp: Update sensor constructors to accept gpu::DeviceIdentification deviceType parameter (no default value). - src/nvidia-gpu/*Sensor.cpp: Use helper function to determine PhysicalContext. Conditionally create interface only when a valid context is returned. Register 'Type' property with the mapped value. - src/nvidia-gpu/NvidiaGpuDevice.cpp: Pass DEVICE_GPU enum for all GPU sensors. Fix io member initialization in constructor. - src/nvidia-gpu/NvidiaSmaDevice.cpp: Pass DEVICE_SMA enum for SMA sensors (no PhysicalContext interface created).
Design rationale: - GpuDevice class doesn't need to know D-Bus implementation details - Centralized mapping function makes maintenance easier - Type-safe enum prevents typos and provides compile-time checking - Automatic handling for different device types without conditional logic in device classes
Tested: Build an image for nvl32-obmc machine with the following patch cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490
Verified via busctl that the Type property appears under xyz.openbmc_project.Common.PhysicalContext interface for GPU sensors only and contains the correct value 'GPU'. SMA device sensors do not have this interface, as expected.
Depends-On: I83dcbe4810139fb92fddf6b099f5a1a057e7e05e Change-Id: I1d5abfa5d4416af3565bf315e0f28cb6af56f14c Signed-off-by: Ender Hsieh <andhsieh@nvidia.com>
show more ...
|
| 7427aeef | 17-Oct-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: add ConnectX Ethernet Port Metrics
Add xyz.openbmc_project.Metric.Value interface for each of the following Ethernet port metrics of a ConnectX device.
- TXBytes - RXBytes - RXMulticast
nvidia-gpu: add ConnectX Ethernet Port Metrics
Add xyz.openbmc_project.Metric.Value interface for each of the following Ethernet port metrics of a ConnectX device.
- TXBytes - RXBytes - RXMulticastFrames - TXMulticastFrames - RXUnicastFrames - TXUnicastFrames - RXBroadcastFrames - TXBroadcastFrames - RXFCSErrors - RXFrameAlignmentErrors - RXFalseCarrierErrors - RXUndersizeFrames - RXOversizeFrames - RXPauseXONFrames - RXPauseXOFFFrames - TXPauseXONFrames - TXPauseXOFFFrames - TXSingleCollisions - TXMultipleCollisions - TXLateCollisions - TXExcessiveCollisions
PDI Patch - https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/84847
Tested: Build an image for nvl32-obmc machine with the following patch cherry picked.
https://gerrit.openbmc.org/c/openbmc/entity-manager/+/84257 https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490
The openbmc patch cherry-picks the following patches that are currently under review.
``` 1. device tree https://lore.kernel.org/all/aRbLqH8pLWCQryhu@molberding.nvidia.com/ 2. mctpd patches https://github.com/CodeConstruct/mctp/pull/85 3. u-boot changes https://lore.kernel.org/openbmc/20251121-msx4-v1-0-fc0118b666c1@nvidia.com/T/#t 4. kernel changes as specified in the openbmc patch (for espi) 5. entity-manager changes https://gerrit.openbmc.org/c/openbmc/entity-manager/+/85455 6. platform-init changes https://gerrit.openbmc.org/c/openbmc/platform-init/+/85456 7. spi changes https://lore.kernel.org/all/20251121-w25q01jv_fixup-v1-1-3d175050db73@nvidia.com/ ```
``` root@nvl32-obmc:~# busctl tree xyz.openbmc_project.GpuSensor `- /xyz `- /xyz/openbmc_project |- /xyz/openbmc_project/inventory | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_NIC | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_NIC/Port_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_NIC/Port_2 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe/DOWN_0 | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe/DOWN_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe/UP_0 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_NIC | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_NIC/Port_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_NIC/Port_2 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe/DOWN_0 | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe/DOWN_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe/UP_0 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_NIC | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_NIC/Port_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_NIC/Port_2 | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe/DOWN_0 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe/DOWN_1 | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe/UP_0 |- /xyz/openbmc_project/metric | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1 | | `- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_broadcast_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_bytes | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_false_carrier_errors | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_fcs_errors | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_frame_alignment_errors | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_multicast_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_oversize_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_pause_xoff_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_pause_xon_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_undersize_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/rx_unicast_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_broadcast_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_bytes | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_excessive_collisions | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_late_collisions | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_multicast_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_multiple_collisions | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_pause_xoff_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_pause_xon_frames | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_single_collisions | | `- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_NIC_Port_1/nic/tx_unicast_frames
root@nvl32-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/metric/port_Nvidia_ConnectX_3_NIC_Port_2/nic/rx_bytes NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Association.Definitions interface - - - .Associations property a(sss) 1 "measuring" "measured_by" "/xyz/ope... emits-change xyz.openbmc_project.Metric.Value interface - - - .Unit property s "xyz.openbmc_project.Metric.Value.Uni... emits-change .Value property d 0 emits-change ```
Change-Id: I30123e35b759182039cb6f25526fafe733c0f354 Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| 1180ed47 | 30-Sep-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: add support for PCIe port metrics
Add xyz.openbmc_project.Metric.Value interface for each of the following PCIe port metric of a ConnectX device.
PCIeErrors.CorrectableErrorCount PCIeEr
nvidia-gpu: add support for PCIe port metrics
Add xyz.openbmc_project.Metric.Value interface for each of the following PCIe port metric of a ConnectX device.
PCIeErrors.CorrectableErrorCount PCIeErrors.NonFatalErrorCount PCIeErrors.FatalErrorCount PCIeErrors.L0ToRecoveryCount PCIeErrors.ReplayCount PCIeErrors.ReplayRolloverCount PCIeErrors.NAKSentCount PCIeErrors.NAKReceivedCount PCIeErrors.UnsupportedRequestCount
PDI Patch - https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/84839
Tested: Build an image for nvl32-obmc machine with the following patch cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490
The patch cherry-picks the following patches that are currently under review.
``` 1. device tree https://lore.kernel.org/all/aRbLqH8pLWCQryhu@molberding.nvidia.com/ 2. mctpd patches https://github.com/CodeConstruct/mctp/pull/85 3. u-boot changes https://lore.kernel.org/openbmc/20251121-msx4-v1-0-fc0118b666c1@nvidia.com/T/#t 4. kernel changes as specified in the openbmc patch (for espi) 5. entity-manager changes https://gerrit.openbmc.org/c/openbmc/entity-manager/+/85455 6. platform-init changes https://gerrit.openbmc.org/c/openbmc/platform-init/+/85456 7. spi changes https://lore.kernel.org/all/20251121-w25q01jv_fixup-v1-1-3d175050db73@nvidia.com/ ```
``` root@nvl32-obmc:~# busctl tree xyz.openbmc_project.GpuSensor `- /xyz `- /xyz/openbmc_project |- /xyz/openbmc_project/inventory | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe/DOWN_0 | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe/DOWN_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_0_PCIe/UP_0 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe/DOWN_0 | | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe/DOWN_1 | | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_2_PCIe/UP_0 | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe/DOWN_0 | |- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe/DOWN_1 | `- /xyz/openbmc_project/inventory/Nvidia_ConnectX_3_PCIe/UP_0 |- /xyz/openbmc_project/metric | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0 | | `- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/correctable_error_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/fatal_error_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/l0_to_recovery_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/nak_received_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/nak_sent_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/non_fatal_error_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/replay_count | | |- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/replay_rollover_count | | `- /xyz/openbmc_project/metric/port_Nvidia_ConnectX_0_PCIe_DOWN_0/pcie/unsupported_request_count
root@nvl32-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/metric/port_Nvidia_ConnectX_3_PCIe_DOWN_1/pcie/l0_to_recovery_count NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Association.Definitions interface - - - .Associations property a(sss) 1 "measuring" "measured_by" "/xyz/ope... emits-change xyz.openbmc_project.Metric.Value interface - - - .Unit property s "xyz.openbmc_project.Metric.Value.Uni... emits-change .Value property d 1 emits-change ```
Change-Id: I3379c09346653d6a6bf2921bf765f0adf5a22098 Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| b341fa2b | 02-Dec-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: enable gpu software inventory
The patch uses the MCTP VDM command to retrieve the GPU driver version and updates the DBus interface xyz.openbmc_project.Software.Version with this informa
nvidia-gpu: enable gpu software inventory
The patch uses the MCTP VDM command to retrieve the GPU driver version and updates the DBus interface xyz.openbmc_project.Software.Version with this information at DBus object path /xyz/openbmc_project/software/. The patch also associates software inventory to the chassis inventory item. The GPU driver version is made available in Redfish at the URI /redfish/v1/UpdateService/FirmwareInventory/.
Tested: Build an image for nvl32-obmc machine with the following patches cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490
The patch cherry-picks the following patches that are currently under review.
``` 1. device tree https://lore.kernel.org/all/aRbLqH8pLWCQryhu@molberding.nvidia.com/ 2. mctpd patches https://github.com/CodeConstruct/mctp/pull/85 3. u-boot changes https://lore.kernel.org/openbmc/20251121-msx4-v1-0-fc0118b666c1@nvidia.com/T/#t 4. kernel changes as specified in the openbmc patch (for espi) 5. entity-manager changes https://gerrit.openbmc.org/c/openbmc/entity-manager/+/85455 6. platform-init changes https://gerrit.openbmc.org/c/openbmc/platform-init/+/85456 7. spi changes https://lore.kernel.org/all/20251121-w25q01jv_fixup-v1-1-3d175050db73@nvidia.com/ ```
The GPU driver version shows up on the DBus.
Change-Id: I712fe0952a02f36e386d3f37a5d4a8192ba641de Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| 68a8e2dd | 29-Sep-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: add support for PCIe port telemetry
Add xyz.openbmc_project.Inventory.Connector.Port Interface for each PCIe port of a ConnectX device.
PDI patches to extend the xyz.openbmc_project.Inv
nvidia-gpu: add support for PCIe port telemetry
Add xyz.openbmc_project.Inventory.Connector.Port Interface for each PCIe port of a ConnectX device.
PDI patches to extend the xyz.openbmc_project.Inventory.Connector.Port Interface - https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/84653 https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/84652
Tested: Build an image for nvl32-obmc machine with the following patch cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490
The patch cherry-picks the following patches that are currently under review.
``` 1. device tree https://lore.kernel.org/all/aRbLqH8pLWCQryhu@molberding.nvidia.com/ 2. mctpd patches https://github.com/CodeConstruct/mctp/pull/85 3. u-boot changes https://lore.kernel.org/openbmc/20251121-msx4-v1-0-fc0118b666c1@nvidia.com/T/#t 4. kernel changes as specified in the openbmc patch (for espi) 5. entity-manager changes https://gerrit.openbmc.org/c/openbmc/entity-manager/+/85455 6. platform-init changes https://gerrit.openbmc.org/c/openbmc/platform-init/+/85456 7. spi changes https://lore.kernel.org/all/20251121-w25q01jv_fixup-v1-1-3d175050db73@nvidia.com/ ```
``` root@nvl32-obmc:~# busctl tree xyz.openbmc_project.GpuSensor `- /xyz `- /xyz/openbmc_project |- /xyz/openbmc_project/inventory | `- /xyz/openbmc_project/inventory/pcie_devices | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_0 | | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_0/DOWN_0 | | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_0/DOWN_1 | | `- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_0/UP_0 | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1 | | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1/DOWN_0 | | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1/DOWN_1 | | `- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1/UP_0 | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_2 | | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_2/DOWN_0 | | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_2/DOWN_1 | | `- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_2/UP_0 | `- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_3 | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_3/DOWN_0 | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_3/DOWN_1 | `- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_3/UP_0 `- /xyz/openbmc_project/sensors
root@nvl32-obmc:~# busctl -l introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1/DOWN_0 NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Association.Definitions interface - - - .Associations property a(sss) 1 "connected_to" "connecting" "/xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1" emits-change xyz.openbmc_project.Inventory.Connector.Port interface - - - .PortProtocol property s "xyz.openbmc_project.Inventory.Connector.Port.PortProtocol.PCIe" emits-change .PortType property s "xyz.openbmc_project.Inventory.Connector.Port.PortType.DownstreamPort" emits-change .Speed property t 34359738368 emits-change .Width property u 16 emits-change ```
Change-Id: I2845f090ac92c8ff6a742ec83c23073e6ea4e1b6 Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| 6ef89739 | 21-Oct-2025 |
Ed Tanous <etanous@nvidia.com> |
nvidia-gpu: Use common class for mctp endpoints
The common endpoint class should be used for send and receive.
Tested: Used by MctpRequester, tested on nvl32-obmc
Change-Id: I0060a66a5bcb4decfbe66
nvidia-gpu: Use common class for mctp endpoints
The common endpoint class should be used for send and receive.
Tested: Used by MctpRequester, tested on nvl32-obmc
Change-Id: I0060a66a5bcb4decfbe663d46ba88529e01e2209 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 964057d1 | 17-Nov-2025 |
George Liu <liuxiwei@ieisystem.com> |
Remove redundant is_method_error() checks
The handlers registered through sdbusplus::bus::match_t only receive D-Bus signals. Signal messages are never sent as method-error replies, and therefore me
Remove redundant is_method_error() checks
The handlers registered through sdbusplus::bus::match_t only receive D-Bus signals. Signal messages are never sent as method-error replies, and therefore message.is_method_error() can never be true in these callbacks.
This change removes all unnecessary is_method_error() checks from signal handlers to simplify the code and avoid confusion.
Change-Id: I43e4a564c1bf401a5da9819dd201464e4a59c871 Signed-off-by: George Liu <liuxiwei@ieisystem.com>
show more ...
|
| 33ba62c7 | 07-Nov-2025 |
Harshit Aghera <haghera@nvidia.com> |
request maintainer role for nvidia-gpu
I have actively contributed to and reviewed patches for nvidia-gpu application since its inception in May 2025. Additionally, I have contributed and reviewed p
request maintainer role for nvidia-gpu
I have actively contributed to and reviewed patches for nvidia-gpu application since its inception in May 2025. Additionally, I have contributed and reviewed phosphor-dbus-interfaces and bmcweb patches related to nvidia-gpu application.
Change-Id: I8eca227699b09c5cdb49495d5237a545c8609e86 Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| 77239da5 | 24-Nov-2025 |
Ed Tanous <etanous@nvidia.com> |
Fix test build if nvidia-gpu is disabled
When nvidia-gpu is disabled, unit tests don't build because of the shared gpusensor_sources variable. Make a quick fix to fix the build. Going forward we ma
Fix test build if nvidia-gpu is disabled
When nvidia-gpu is disabled, unit tests don't build because of the shared gpusensor_sources variable. Make a quick fix to fix the build. Going forward we may need the option checking to be put into the sub meson file rather than the top level, so that unit test deps can build separately.
Change-Id: Ib87487fe15e80df44afbd9c3421163c6fbc16f74 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 064e6ff7 | 27-Oct-2025 |
Deepak Kodihalli <deepak.kodihalli.83@gmail.com> |
nvidia-gpu: fix GPU power PeakReading PDI usage
The GPU power peak reading, which uses the Telemetry.Report PDI, was relying on a string ("PeakReading") to expose the reading. This string is Redfish
nvidia-gpu: fix GPU power PeakReading PDI usage
The GPU power peak reading, which uses the Telemetry.Report PDI, was relying on a string ("PeakReading") to expose the reading. This string is Redfish specific. Instead, use the OperationType.Maximum enum defined in the PDI. Bmcweb code can map this to PeakReading.
Tested: Build an image for nvl32-obmc machine with the following patches cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490 https://gerrit.openbmc.org/c/openbmc/bmcweb/+/82449.
The patch cherry-picks the following patches that are currently under review.
``` 1. device tree https://lore.kernel.org/all/aRbLqH8pLWCQryhu@molberding.nvidia.com/ 2. mctpd patches https://github.com/CodeConstruct/mctp/pull/85 3. u-boot changes https://lore.kernel.org/openbmc/20251121-msx4-v1-0-fc0118b666c1@nvidia.com/T/#t 4. kernel changes as specified in the openbmc patch (for espi) 5. entity-manager changes https://gerrit.openbmc.org/c/openbmc/entity-manager/+/85455 6. platform-init changes https://gerrit.openbmc.org/c/openbmc/platform-init/+/85456 7. spi changes https://lore.kernel.org/all/20251121-w25q01jv_fixup-v1-1-3d175050db73@nvidia.com/ ```
The GPU Power PeakReading is correctly reported on DBus and on redfish.
Change-Id: I39b2b4987d845f878ffdedcfdb02cdfdc02a4499 Signed-off-by: Deepak Kodihalli <deepak.kodihalli.83@gmail.com> Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| e0b80e1e | 28-Aug-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: add support for ConnectX device
Add support to discover ConnectX devices and to populate PCIe interface properties using Phosphor DBus Interface xyz.openbmc_project.Inventory.Item.PCIeDe
nvidia-gpu: add support for ConnectX device
Add support to discover ConnectX devices and to populate PCIe interface properties using Phosphor DBus Interface xyz.openbmc_project.Inventory.Item.PCIeDevice.
ConnectX device has an integrated PCIe Switch. The patch uses xyz.openbmc_project.Inventory.Item.PCIeSwitch PDI to define the PCIe Switch resource.
Tested: Build an image for nvl32-obmc machine with the following patch cherry picked.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/85490
The patch cherry-picks the following patches that are currently under review.
``` 1. device tree https://lore.kernel.org/all/aRbLqH8pLWCQryhu@molberding.nvidia.com/ 2. mctpd patches https://github.com/CodeConstruct/mctp/pull/85 3. u-boot changes https://lore.kernel.org/openbmc/20251121-msx4-v1-0-fc0118b666c1@nvidia.com/T/#t 4. kernel changes as specified in the openbmc patch (for espi) 5. entity-manager changes https://gerrit.openbmc.org/c/openbmc/entity-manager/+/85455 6. platform-init changes https://gerrit.openbmc.org/c/openbmc/platform-init/+/85456 7. spi changes https://lore.kernel.org/all/20251121-w25q01jv_fixup-v1-1-3d175050db73@nvidia.com/ ```
``` root@nvl32-bmc:~# busctl tree xyz.openbmc_project.GpuSensor `- /xyz `- /xyz/openbmc_project |- /xyz/openbmc_project/inventory | `- /xyz/openbmc_project/inventory/pcie_devices | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_0 | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_1 | |- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_2 | `- /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_3
root@nvl32-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/inventory/pcie_devices/Nvidia_ConnectX_0 NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Inventory.Item.PCIeDevice interface - - - .GenerationInUse property s "xyz.openbmc_project.Inventory.Item.P... emits-change .GenerationSupported property s "xyz.openbmc_project.Inventory.Item.P... emits-change .LanesInUse property u 8 emits-change .MaxLanes property u 16 emits-change xyz.openbmc_project.Inventory.Item.PCIeSwitch interface - - -
$ curl -s -k -u 'root:0penBmc' https://${bmc_ip}/redfish/v1/Systems/system/PCIeDevices/Nvidia_ConnectX_0 { "@odata.id": "/redfish/v1/Systems/system/PCIeDevices/Nvidia_ConnectX_0", "@odata.type": "#PCIeDevice.v1_19_0.PCIeDevice", "Id": "Nvidia_ConnectX_0", "Name": "PCIe Device", "PCIeFunctions": { "@odata.id": "/redfish/v1/Systems/system/PCIeDevices/Nvidia_ConnectX_0/PCIeFunctions" }, "PCIeInterface": { "LanesInUse": 8, "MaxLanes": 16, "MaxPCIeType": "Gen5", "PCIeType": "Gen5" }, "Status": { "Health": "OK", "State": "Enabled" } }% ```
Change-Id: Id89ce8a298ebb16934e94efcb9ca4679f91a7b26 Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| db74edb9 | 29-Sep-2025 |
Ed Tanous <etanous@nvidia.com> |
nvidia-gpu: move unused member
Move MaxMessageSize to where it's used
Change-Id: I6c45157e6e3e52672cab86c82af1ea45a3628d19 Signed-off-by: Ed Tanous <etanous@nvidia.com> |
| 779d84f0 | 29-Sep-2025 |
Ed Tanous <etanous@nvidia.com> |
nvidia-gpu: Declare send endpoint on stack
There's no reason to store this small class in between transactions. Just construct on stack as part of the send.
Tested: On last patchset in series
Chan
nvidia-gpu: Declare send endpoint on stack
There's no reason to store this small class in between transactions. Just construct on stack as part of the send.
Tested: On last patchset in series
Change-Id: I00090942665f022bfa2552b9c31c7c3da000646b Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| b5e823f7 | 09-Oct-2025 |
Ed Tanous <ed@tanous.net> |
Change copyright to match linux foundation
We should use SPDX identifiers wherever possible for simplification.
Change-Id: If3a7bfe506d7fded64a3ac929cc643834b16303e Signed-off-by: Ed Tanous <etanou
Change copyright to match linux foundation
We should use SPDX identifiers wherever possible for simplification.
Change-Id: If3a7bfe506d7fded64a3ac929cc643834b16303e Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 3f6bc731 | 23-Jul-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: add TLimit sensor properties
Add support for DMTF Redfish properties ReadingBasis and Implementation for GPU TLimit sensor [1].
Property Implementation for TLimit is set to Synthesized
nvidia-gpu: add TLimit sensor properties
Add support for DMTF Redfish properties ReadingBasis and Implementation for GPU TLimit sensor [1].
Property Implementation for TLimit is set to Synthesized because the GPU incorporates intelligent logic that determines the temperature delta from the first thermal management software slowdown event. TLimit is derived from other reported GPU sensors, such as HBM, Tavg, and others.
DBus Interface definition - https://gerrit.openbmc.org/c/openbmc/phosphor-dbus-interfaces/+/81658
Tested: Build an image for gb200nvl-obmc machine with the following patches cherry picked. This patches are needed to enable the mctp stack.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422
``` > curl -s -k -u 'root:0penBmc' https://10.137.203.137/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_0_TEMP_1 { "@odata.id": "/redfish/v1/Chassis/NVIDIA_GB200_1/Sensors/temperature_NVIDIA_GB200_GPU_0_TEMP_1", "@odata.type": "#Sensor.v1_2_0.Sensor", "Description": "Thermal Limit(TLIMIT) Temperature is the distance in deg C from the GPU temperature to the first throttle limit.", "Id": "temperature_NVIDIA_GB200_GPU_0_TEMP_1", "Implementation": "Synthesized", "Name": "NVIDIA GB200 GPU 0 TEMP 1", "Reading": 56.59375, "ReadingBasis": "Headroom", "ReadingRangeMax": 127.0, "ReadingRangeMin": -128.0, "ReadingType": "Temperature", "ReadingUnits": "Cel", "Status": { "Health": "OK", "State": "Enabled" } }%
root@gb200nvl-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/sensors/temperature/NVIDIA_GB200_GPU_0_TEMP_1 NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Association.Definitions interface - - - .Associations property a(sss) 1 "chassis" "all_sensors" "/xyz/openb... emits-change xyz.openbmc_project.Inventory.Item interface - - - .PrettyName property s "Thermal Limit(TLIMIT) Temperature is... emits-change xyz.openbmc_project.Sensor.Type interface - - - .Implementation property s "xyz.openbmc_project.Sensor.Type.Impl... emits-change .ReadingBasis property s "xyz.openbmc_project.Sensor.Type.Read... emits-change xyz.openbmc_project.Sensor.Value interface - - - .MaxValue property d 127 emits-change .MinValue property d -128 emits-change .Unit property s "xyz.openbmc_project.Sensor.Value.Uni... emits-change .Value property d 56.6836 emits-change writable xyz.openbmc_project.Sensor.ValueMutability interface - - - .Mutable property b true emits-change xyz.openbmc_project.State.Decorator.Availability interface - - - .Available property b true emits-change writable xyz.openbmc_project.State.Decorator.OperationalStatus interface - - - .Functional property b true emits-change ```
[1] : https://redfish.dmtf.org/schemas/v1/Sensor.v1_11_0.yaml
Change-Id: I1a16ced44c563794d561d26232a5e5fba041b875 Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| 1851f645 | 29-Sep-2025 |
Marc Olberding <molberding@nvidia.com> |
nvidia-gpu: Fix thresholds for GPU_TEMP_1
Fixes thresholds for GPU_TEMP_1 to be upper critical, warning, shutdown. Rather than lower critical, et al.
Change-Id: I580766288f3d27a48c75f00ea1dab13f028
nvidia-gpu: Fix thresholds for GPU_TEMP_1
Fixes thresholds for GPU_TEMP_1 to be upper critical, warning, shutdown. Rather than lower critical, et al.
Change-Id: I580766288f3d27a48c75f00ea1dab13f0284bed6 Signed-off-by: Marc Olberding <molberding@nvidia.com>
show more ...
|
| fd4a3779 | 24-Sep-2025 |
Marc Olberding <molberding@nvidia.com> |
nvidia-gpu: Fix a number of object lifetime issues
Moves all subsensors and objects treated as shared_ptrs to be using shared_from_this. This way, if there's an object lifetime issue we don't segfau
nvidia-gpu: Fix a number of object lifetime issues
Moves all subsensors and objects treated as shared_ptrs to be using shared_from_this. This way, if there's an object lifetime issue we don't segfault.
Also separates construction and asio init for NvidiaSmaDevice so that when we bind to this, its valid after we leave the ctor
Change-Id: I8e3115bc276d2e0eaac0b1dc9a9d2c46e6751d4b Signed-off-by: Marc Olberding <molberding@nvidia.com>
show more ...
|
| 6282a452 | 29-Sep-2025 |
Marc Olberding <molberding@nvidia.com> |
nvidia-gpu: NvidiaGpuDevice fix use after free
Fixes use after free for NvidiaGpuThresholds. Moves the storage used for communication to be part of the NvidiaGpuDevice class instead of ephemerally p
nvidia-gpu: NvidiaGpuDevice fix use after free
Fixes use after free for NvidiaGpuThresholds. Moves the storage used for communication to be part of the NvidiaGpuDevice class instead of ephemerally passed around through free functions
Also makes NvidiaGpuDevice inherit from std::enable_shared_from_this
Testing: Issue found previous was coredumps on nvl32-obmc. Asan discovered it was a use after free in the shared pointer in ThermalLimits
Afterwards, no core dumps or issues reported by asan. Ran on an nvl32-obmc model with 8 GPU's
Change-Id: I61b606f3a129499089718e7ec804926db5f22c64 Signed-off-by: Marc Olberding <molberding@nvidia.com>
show more ...
|
| ac920734 | 28-Sep-2025 |
Marc Olberding <molberding@nvidia.com> |
nvidia-gpu: deferred init for NvidiaGpuDevice
Adds deferred init for NvidiaGpuDevice, so that when we bind to this, the this pointer is valid, i.e. after construction is completed
Change-Id: I24a53
nvidia-gpu: deferred init for NvidiaGpuDevice
Adds deferred init for NvidiaGpuDevice, so that when we bind to this, the this pointer is valid, i.e. after construction is completed
Change-Id: I24a53d2ab9be1a2a4431368414a154b48347d2a2 Signed-off-by: Marc Olberding <molberding@nvidia.com>
show more ...
|
| d0125c9c | 08-Oct-2025 |
Marc Olberding <molberding@nvidia.com> |
nvidia-gpu: Fix up buffering in MctpRequester
This change does a lot, for better or worse 1. Change MctpRequester to hold both buffers for send and receive 2. This requires changing the callback str
nvidia-gpu: Fix up buffering in MctpRequester
This change does a lot, for better or worse 1. Change MctpRequester to hold both buffers for send and receive 2. This requires changing the callback structure, so the reach is far 3. Changes error reporting to be through std::error_code 4. Collapses the QueuingRequeuster and Requeuster to be MctpRequeuster 5. Doing 4 gets rid of a level indirection and an extra unordered_map 6. Adds proper iid support, which is made significantly easier by 4/5 7. Fixes issues around expiry timer's where we would cancel the timer for a given request whenever a new packet would come in to be sent. This could cause lockup if a packet truly did time out and an interleaved packet finished sending. This moves each queue to have its own timer.
This fixes an issue where we were receiving buffers in from clients and then binding them to receive_calls without ensuring that they are the correct message, thus when receive was called, it was called with the last bound buffer to async_receive_from. This would cause a number of issues, ranging from incorrect device discovery results to core dumps as well as incorrect sensor readings.
This change moves the receive and send buffers to be owned by the MctpRequester, and a non-owning view is provided via callback to the client. All existing clients just decode in place given that buffer.
Tested: loaded onto nvl32-obmc. Correct number of sensors showed up and the readings were nominal
Change-Id: I67c843691ca79e9fcccfa16df6d611918f25f6ca Signed-off-by: Marc Olberding <molberding@nvidia.com>
show more ...
|
| 6b712322 | 31-Jul-2025 |
Harshit Aghera <haghera@nvidia.com> |
nvidia-gpu: add Power Sensor PeakReading Property
Add support for Sensor Properties PeakReading and PeakRedingTime.
Current Limitation - The ResetMetrics action is currently not supported for Redfi
nvidia-gpu: add Power Sensor PeakReading Property
Add support for Sensor Properties PeakReading and PeakRedingTime.
Current Limitation - The ResetMetrics action is currently not supported for Redfish URIs in bmcweb. As a result, the ability to clear PeakReading values for GPU Power Sensors has not been implemented.
Future Consideration - If ResetMetrics action support is added to bmcweb in the future, the corresponding functionality will also need to be implemented in the dbus-sensor application to ensure full compatibility.
Tested: Build an image for gb200nvl-obmc machine with the following patches cherry picked. This patches are needed to enable the mctp stack.
https://gerrit.openbmc.org/c/openbmc/openbmc/+/79422
``` root@gb200nvl-obmc:~# busctl introspect xyz.openbmc_project.GpuSensor /xyz/openbmc_project/sensors/power/NVIDIA_GB200_GPU_0_Power_0 NAME TYPE SIGNATURE RESULT/VALUE FLAGS org.freedesktop.DBus.Introspectable interface - - - .Introspect method - s - org.freedesktop.DBus.Peer interface - - - .GetMachineId method - s - .Ping method - - - org.freedesktop.DBus.Properties interface - - - .Get method ss v - .GetAll method s a{sv} - .Set method ssv - - .PropertiesChanged signal sa{sv}as - - xyz.openbmc_project.Association.Definitions interface - - - .Associations property a(sss) 1 "chassis" "all_sensors" "/xyz/openb... emits-change xyz.openbmc_project.Sensor.Value interface - - - .MaxValue property d 5000 emits-change .MinValue property d 0 emits-change .Unit property s "xyz.openbmc_project.Sensor.Value.Uni... emits-change .Value property d 29.194 emits-change writable xyz.openbmc_project.Sensor.ValueMutability interface - - - .Mutable property b true emits-change xyz.openbmc_project.State.Decorator.Availability interface - - - .Available property b true emits-change writable xyz.openbmc_project.State.Decorator.OperationalStatus interface - - - .Functional property b true emits-change xyz.openbmc_project.Telemetry.Report interface - - - .Readings property (ta(ssdt)) 0 1 "PeakReading" "" 80.933 0 emits-change ```
Change-Id: I0a4f7eb0a5db688f32bf80954839140da9bb7e2a Signed-off-by: Harshit Aghera <haghera@nvidia.com>
show more ...
|
| aba6fcac | 29-Sep-2025 |
Ed Tanous <etanous@nvidia.com> |
Fix tidy build
This appears to be something tidy is wrong about. The suggestion of adding math to the struct initializers appears to not compile.
Move the calculation of hysteresisTrigger and hyst
Fix tidy build
This appears to be something tidy is wrong about. The suggestion of adding math to the struct initializers appears to not compile.
Move the calculation of hysteresisTrigger and hysteresisPublish into the constructor body itself to avoid the warning.
Change-Id: I833fd12966c69c0e081692d6d40ba0cf1805ead1 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 87a0745b | 03-Sep-2025 |
Ed Tanous <etanous@nvidia.com> |
Move Nvidia gpu tests
These tests got caught in the refactor. Move these tests to the correct location.
Change-Id: Ie8ec10e154d60cb4f24e1f45be36240863438f87 Signed-off-by: Ed Tanous <etanous@nvidi
Move Nvidia gpu tests
These tests got caught in the refactor. Move these tests to the correct location.
Change-Id: Ie8ec10e154d60cb4f24e1f45be36240863438f87 Signed-off-by: Ed Tanous <etanous@nvidia.com>
show more ...
|
| 6061bbcf | 03-Sep-2025 |
Ed Tanous <etanous@nvidia.com> |
Remove main
Unit tests don't build if main is enabled.
Change-Id: I4c7210b2a72032d6e15729b5ab5e4201739dd602 Signed-off-by: Ed Tanous <etanous@nvidia.com> |