2019. május 19., vasárnap

Limit the CPU maximum frequency to avoid thermal shutdowns under Ubuntu 18.04

I used to have similar problems for multiple laptops. It seems that the CPU in the laptops tends to overheat more easily over time and shut down. Replacing the CPU fan and quality thermal paste never helped me in these situations. So far I limited the max frequency on Ubuntu, but it might happen that you just leave your laptop while doing some processing for a moment under the Sun and it just overheats the whole laptop body, causing a shut down eventually.
I learned that the newest laptops with Intel chips don't work with cpufreq-set properly, but only with likwid tools.
Installing this package:
sudo apt install likwid
I wrote the following python script to decrease/increase the max CPU frequency (manipulate_cpu_freq.py) under Ubuntu 18.04 (requires Python 3.7):

import argparse
import os
import subprocess

parser = argparse.ArgumentParser(description = "Manipulate CPU frequencies", prefix_chars = '-')
parser.add_argument("-d", "--decrease", help = "decrease the max frequency", type = bool, default = False)
parser.add_argument("-i", "--increase", help = "increase the max frequency", type = bool, default = False)
parser.add_argument("-s", "--silent", help = "silent mode", type = bool, default = False)
args = parser.parse_args()

query_freqs_output = subprocess.run(["likwid-setFrequencies", "-l"], capture_output = True)
query_freqs_output = query_freqs_output.stdout.decode('utf-8').split('\n')[1]
query_freqs_output = query_freqs_output.split(' ')
available_freqs = list(map(float, query_freqs_output))

query_curr_freq_output = subprocess.run(["likwid-setFrequencies", "-p"], capture_output = True)
query_curr_freq_output = query_curr_freq_output.stdout.decode('utf-8').split('\n')[1]
query_curr_freq_output = query_curr_freq_output.split('/')[-1]
current_freq = float(query_curr_freq_output.split(' ')[0])
curr_freq_index = min(range(len(available_freqs)), key = lambda i: abs(available_freqs[i]-current_freq))

if not args.silent:
  print("Available frequencies:", available_freqs)
  print("Current frequency:", current_freq)

if args.decrease:
  print("Decrease the frequency")
  if curr_freq_index == 0:
    print("Warning: Can't decrease the frequency because it is already at min")

  print("Set to frequency", available_freqs[curr_freq_index-1], "Ghz")
  subprocess.run(["likwid-setFrequencies", "-y", str(available_freqs[curr_freq_index-1])])

if args.increase:
  print("Increase the frequency")
  if curr_freq_index == len(available_freqs)-1:
    print("Warning: Can't increase the frequency because it is already at max")

  print("Set to frequency", available_freqs[curr_freq_index+1], "Ghz")
  subprocess.run(["likwid-setFrequencies", "-y", str(available_freqs[curr_freq_index+1])])
And I use a script running in the background to monitor the CPU temperature (run_cpu_policy.sh):

while true
  CPU_TEMP=$(cat /sys/devices/virtual/thermal/thermal_zone0/temp)
  echo CPU Temperature: $(echo ${CPU_TEMP}/1000 | bc)°C
  if [ "$CPU_TEMP" -gt 76000 ]; then
    echo Decrease the max CPU frequency
    sudo manipulate_cpu_freq.py -s 1 -d 1
  if [ "$CPU_TEMP" -le 68000 ]; then
    echo Increase the max CPU frequency
    sudo manipulate_cpu_freq.py -s 1 -i 1
  sleep 10
Surely, you must check which sys point (e.g. /sys/devices/virtual/thermal/thermal_zone0/temp) contains your CPU temperature and adapt the script above. I increase the CPU max frequency when the temperature is below 68°C and decrease if it is above 76°C. It is very conservative policy, but the temperature may reach quickly above 100°C (around thermal shutdown threshold), if it sits above 80°C permanently thus I try to keep always below 80°C, just to be sure.
I had to develop the above solution yesterday because I got two thermal shutdowns because of the sunny, hot day while running intensive computations on my laptop CPU (Intel i7-6600U) continuously.
You can run the script after every startup with adding to the cron jobs (/etc/crontab):
@reboot root systemd-run --scope sudo -u YOUR_USER screen -dmS cpu_policy /home/YOUR_USER/run_cpu_policy.sh
Be sure to have screen installed:
sudo apt install screen
You can check it while running:
screen -r cpu_policy