scx_utils: use proper NVML API for GPU NUMA node lookup#3312
scx_utils: use proper NVML API for GPU NUMA node lookup#3312devnexen wants to merge 1 commit intosched-ext:mainfrom
Conversation
|
cc @arighi |
htejun
left a comment
There was a problem hiding this comment.
Please prove that this optimization is meaningful.
using the wrapper but still falling back to the slow(er) path in case. new attempt after sched-ext#3108 only this time a new nvml-wrapper version had been released. Signed-off-by: David Carlier <devnexen@gmail.com>
a8e0409 to
a9caaee
Compare
|
To be honest, the win when I tried it was not that much performance wise mostly saving the FS operation time, I m digging the benchmark (hopefully still have it somewhere); but to summarize. The primary motivation is correctness rather than raw speed. The old code reverse-engineers a sysfs path from NVML's bus ID format by stripping a "0000" prefix — this is fragile and the FIXME comment in the numa_node_id() uses the proper NVML API (nvmlDeviceGetNumaNodeId) directly. The fallback to the sysfs path is kept for older NVML versions that don't support it. |
|
here it is, as gist. Locally it is only ~1.1 faster on average (I have only one nvidia gpu tough). Anyhow, using the now available direct wrapper NVML call alone, is a valid reason IMHO. |
using the wrapper but still falling back to the slow(er) path in case. new attempt after #3108 only this time a new nvml-wrapper version had been released.