Skip to content

Instantly share code, notes, and snippets.

@fonic
Last active June 24, 2023 12:43
Show Gist options
  • Save fonic/8f38e5e3ce5c8693ae3a23aa1af21fb9 to your computer and use it in GitHub Desktop.
Save fonic/8f38e5e3ce5c8693ae3a23aa1af21fb9 to your computer and use it in GitHub Desktop.
KDE KSysGuard NVIDIA GPU Sensors - see comments below for usage information
#!/usr/bin/env bash
# -------------------------------------------------------------------------
# -
# Created by Fonic (https://github.com/fonic) -
# Date: 12/29/19 - 02/12/20 -
# -
# Created for and tested on single-GPU system equipped with NVIDIA -
# GeForce RTX 2060 SUPER. For other systems, modifications might be -
# required. -
# -
# Based on: -
# https://gist.github.com/frantic1048/41f56fd6328fa83ce6ad5acb3a4c0336 -
# https://gist.github.com/hacker1024/c01a773f50769bd8216fa01ea0a1ef33 -
# https://techbase.kde.org/Development/Tutorials/Sensors -
# -
# -------------------------------------------------------------------------
# Globals
MAX_GPU_TEMP=100 # maximum GPU temperature in °C
MAX_MEM_USED=8192 # maximum memory used in MB
MAX_GPU_CLOCK=2100 # maximum GPU clock in MHz
MAX_MEM_CLOCK=7000 # maximum memory clock in MHz
MAX_FAN_RPM=3000 # maximum fan speed in RPM
# Main loop
echo "ksysguardd 1.2.0"
echo -n "ksysguardd> "
last_update=$((${SECONDS} - 1))
while read -r input; do
# Update monitor data just in time, but not more often than once per second
# NOTE:
# nvidia-settings will stop printing data on the first error it encounters.
# Thus, queries for [fan:1] have to be placed at the end of the list for the
# script to work on systems with only one fan
if (( ${SECONDS} < ${last_update} || ${SECONDS} - ${last_update} >= 1 )); then
readarray -t lines < <(nvidia-settings -t \
-q [gpu:0]/GPUCoreTemp \
-q [gpu:0]/UsedDedicatedGPUMemory \
-q [gpu:0]/GPUCurrentClockFreqs \
-q [gpu:0]/GPUUtilization \
-q [fan:0]/GPUCurrentFanSpeed \
-q [fan:0]/GPUCurrentFanSpeedRPM \
-q [fan:1]/GPUCurrentFanSpeed \
-q [fan:1]/GPUCurrentFanSpeedRPM \
)
gpu_temp="${lines[0]}"
mem_used="${lines[1]}"
gpu_clock="${lines[2]%,*}"
mem_clock="${lines[2]#*,}"
re="^graphics=([0-9]+), memory=([0-9]+), video=([0-9]+), PCIe=([0-9]+)$"
if [[ "${lines[3]}" =~ ${re} ]]; then
gpu_load="${BASH_REMATCH[1]}"
mem_load="${BASH_REMATCH[2]}"
vpu_load="${BASH_REMATCH[3]}"
pcie_load="${BASH_REMATCH[4]}"
fi
fan0_load="${lines[4]}"
fan0_rpm="${lines[5]}"
fan1_load="${lines[6]}"
fan1_rpm="${lines[7]}"
last_update=${SECONDS}
fi
# Evaluate input, generate output
case "${input}" in
# List of monitors (format: '<id>\t<data-type>')
"monitors")
echo -e "gpu_temp\tinteger"
echo -e "mem_used\tinteger"
echo -e "gpu_clock\tinteger"
echo -e "mem_clock\tinteger"
echo -e "gpu_load\tinteger"
echo -e "mem_load\tinteger"
echo -e "vpu_load\tinteger"
echo -e "pcie_load\tinteger"
echo -e "fan0_load\tinteger"
echo -e "fan0_rpm\tinteger"
echo -e "fan1_load\tinteger"
echo -e "fan1_rpm\tinteger"
;;
# Monitor info (format: '<label>\t<min-value>\t<max-value>\t<unit>')
"gpu_temp?") echo -e "GPU\t0\t${MAX_GPU_TEMP}\t°C"; ;;
"mem_used?") echo -e "MEM\t0\t${MAX_MEM_USED}\tMB"; ;;
"gpu_clock?") echo -e "GPU\t0\t${MAX_GPU_CLOCK}\tMHz"; ;;
"mem_clock?") echo -e "MEM\t0\t${MAX_MEM_CLOCK}\tMHz"; ;;
"gpu_load?") echo -e "GPU\t0\t100\t%"; ;;
"mem_load?") echo -e "MEM\t0\t100\t%"; ;;
"vpu_load?") echo -e "VPU\t0\t100\t%"; ;;
"pcie_load?") echo -e "PCIe\t0\t100\t%"; ;;
"fan0_load?") echo -e "FAN1\t0\t100\t%"; ;;
"fan0_rpm?") echo -e "FAN1\t0\t${MAX_FAN_RPM}\tRPM"; ;;
"fan1_load?") echo -e "FAN2\t0\t100\t%"; ;;
"fan1_rpm?") echo -e "FAN2\t0\t${MAX_FAN_RPM}\tRPM"; ;;
# Monitor data (format: '<value>')
"gpu_temp") echo "${gpu_temp}"; ;;
"mem_used") echo "${mem_used}"; ;;
"gpu_clock") echo "${gpu_clock}"; ;;
"mem_clock") echo "${mem_clock}"; ;;
"gpu_load") echo "${gpu_load}"; ;;
"mem_load") echo "${mem_load}"; ;;
"vpu_load") echo "${vpu_load}"; ;;
"pcie_load") echo "${pcie_load}"; ;;
"fan0_load") echo "${fan0_load}"; ;;
"fan0_rpm") echo "${fan0_rpm}"; ;;
"fan1_load") echo "${fan1_load}"; ;;
"fan1_rpm") echo "${fan1_rpm}"; ;;
"exit"|"quit") break; ;;
esac
# Renew prompt
echo -n "ksysguardd> "
done
@fonic
Copy link
Author

fonic commented Dec 29, 2019

Usage (replace <user> with your username):

  1. Save this script as:
    /home/<user>/.local/share/ksysguard/nvidia-sensors.sh
  2. Edit the script and adjust the maximum values below comment # Globals to your liking (optional)
  3. Make the script executable:
    # chmod +x /home/<user>/.local/share/ksysguard/nvidia-sensors.sh
  4. Either proceed with steps 5-11 or use this tab file instead
  5. Open KSysGuard
  6. Select File -> New Tab...
  7. Adjust Title, Rows, Columns and Update interval to your liking (optional)
    Click OK
  8. Select File -> Monitor Remote Machine...
  9. For Host, enter nvidia
    For Connection Type, select Custom command
    For Command, enter /home/<user>/.local/share/ksysguard/nvidia-sensors.sh
    Click OK
  10. In Sensor Browser, new group nvidia should now appear
  11. Drag & drop sensors from nvidia to Drop Sensor Here fields

Matching tab file for this script: KDE KSysGuard Tab for NVIDIA GPU Sensors.

Screenshot_20210629_213109

@fonic
Copy link
Author

fonic commented Feb 13, 2020

@kamelie1706: Thanks for testing. I throws an error, but still returns a valid value - I'd guess KSysGuard is ok with that and does not care about output on stderr, but I think I'll add a comment as you suggested.

By the way, have you noticed there is now an nvidia plugin under development in KSysGuard tree?
https://github.com/KDE/ksysguard/tree/master/plugins/process/nvidia

No, I did not know that. But from looking at the code I'd say this is only very basic and probably will stay like this due to the utility they use (nvidia-smi), which only reports very basic data. But still, I think it's great that they think about native support.

@fonic
Copy link
Author

fonic commented Feb 14, 2020

Yes and no ... this is the stage of the code but it seems you can get pretty much what you want from nvidia-smi
https://gist.github.com/Sporif/4ce63f7b6eea691bdbb18905a9589169

Not if you want to use nvidia-smi dmon (which is what they are using). Then you are limited to what nvidia-smi dmon -s pucvmet provides.

@hacker1024
Copy link

Thanks for extending this further!
The memory clock is inaccurate for me - it seems to show exactly half of what the NVIDIA settings app displays.
The load sensors also always stay at 0.

@fonic
Copy link
Author

fonic commented May 30, 2020

@hacker1024:

Thanks for extending this further!

You're welcome ;)

The memory clock is inaccurate for me - it seems to show exactly half of what the NVIDIA settings app displays.

That's by design. It shows the actual memory clock while the NVIDIA settings app displays what they call Memory Transfer Rate, which is memory clock * 2 (due to DDR). My guess is that this was a marketing decision - bigger numbers are better numbers... ;)

If you like the big numbers, you can change

mem_clock="${lines[2]#*,}"

to

mem_clock="${lines[2]#*,}"
mem_clock=$((mem_clock * 2))

The load sensors also always stay at 0.

That can be fixed. What does nvidia-settings -t -q [gpu:0]/GPUUtilization output on your system?

@dkinneyBU
Copy link

Yours was a lot easier to implement than your predecessors. Works great, thank you.

@fonic
Copy link
Author

fonic commented Jun 29, 2021

@dkinneyBU:

Yours was a lot easier to implement than your predecessors. Works great, thank you.

Thanks, much appreciated!

@BETLOG
Copy link

BETLOG commented Jul 11, 2021

+1 for: nice!
I grouped some sensors and ended up liking this layout.
https://gist.github.com/BETLOG/85f069c248daa1ccfd8b4996a0bb5b28

2021-07-13--20-47-34_betlogbeast_SystemMonitor

@alexanderhelbok
Copy link

alexanderhelbok commented Feb 23, 2022

cool script thank you!
Any chance you could implement reading the Power draw of the GPU?
It can be read using:
nvidia-smi --query-gpu=power.draw --format=csv
but I haven't managed to read it using nvidia-settings (as you do in your script)

@fonic
Copy link
Author

fonic commented Feb 23, 2022

@alexanderhelbok: you're welcome.

Just had a quick look at nvidia-smi. My script could now be completely rewritten to use that instead of nvidia-settings. Back in 2019 when I initially created the script, nvidia-smi was still in its infancy and could only query/display a handful of metrics, thus nvidia-settings was the only way to go.

However, I'm quite certain that I won't rewrite it - KSysGuard is deprecated in favor of Plasma System Monitor anyway, thus the script will become deprecated as well in the foreseeable future.

@alexanderhelbok
Copy link

alexanderhelbok commented Feb 23, 2022

True fair enough, time to switch to Plasma System Monitor then ^^

@fonic
Copy link
Author

fonic commented Feb 24, 2022

True fair enough, time to switch to Plasma System Monitor then ^^

... which already has built-in NVIDIA support, albeit quite limited. I think what Plasma System Monitor can currently monitor for NVIDIA GPUs is exactly what nvidia-smi used to provide back then. If I find the time, I might consider contributing to Plasma System Monitor and extend NVIDIA support further.

@fubarhouse
Copy link

This is seriously cool, thank you!
I wish System Monitor was this extensive, but I might have to switch back for detailed metrics now!

@fonic
Copy link
Author

fonic commented Nov 30, 2022

@fubarhouse Thanks, glad you like it. I'm actually still using KSysGuard and this script myself - last time I checked System Monitor wasn't really to my liking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment