I run an old desktop mainboard as my homelab server. It runs Ubuntu smoothly at loads between 0.2 and 3 (whatever unit that is).

Problem:
Occasionally, the CPU load skyrockets above 400 (yes really), making the machine totally unresponsive. The only solution is the reset button.

Solution:

  • I haven’t found what the cause might be, but I think that a reboot every few days would prevent it from ever happening. That could be done easily with a crontab line.
  • alternatively, I would like to have some dead-simple script running in the background that simply looks at the CPU load and executes a reboot when the load climbs over a given threshold.

–> How could such a cpu-load-triggered reboot be implemented?


edit: I asked ChatGPT to help me create a script that is started by crontab every X minutes. The script has a kill-threshold that does a kill-9 on the top process, and a higher reboot-threshold that … reboots the machine. before doing either, or none of these, it will write a log line. I hope this will keep my system running, and I will review the log file to see how it fares. Or, it might inexplicable break my system. Fun!

  • Possibly linux@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Here’s a better suggestion. Why don’t you see if you can find out what’s causing the issue? It sounds a like a problem occurring in userspace. Try running htop

    • PlutoniumAcid@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      You know you are right, and I’ve tried. I can manually monitor but it doesn’t happen just then. I don’t know yet what causes it, I can only assume it’s one of the Docker containers because the machine is doing nothing else.

      I am doing this to find out how often it happens, how quickly it happens, and what’s at the top when it happens.

      • vegetaaaaaaa@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I can manually monitor but it doesn’t happen just then

        Setup proper monitoring with history. That way yo don’t have to babysit the server, you can just look at the charts after a crash. I usually go with netdata

  • cron@feddit.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Just as a side note, the load factor can also mean that processes are limited by IO:

    Unix systems traditionally just counted processes waiting for the CPU, but Linux also counts processes waiting for other resources – for example, processes waiting to read from or write to the disk.

    Source

    • teawrecks@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      arrow-down
      2
      ·
      1 year ago

      I would assume that wouldn’t cause so much contention that the system is unusable, though, right? Unless they’re busy waiting.

    • PlutoniumAcid@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 year ago

      Nope, haven’t. It says I have 2 GB of swap on a 16 GB RAM system, and that seems reasonable.

      Why would you recommend turning swap off?

      • marcos@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        To check if your problem is caused by excessive memory usage requiring constant swapping. If it is, turning swap off will make some process be killed instead of slowing the computer down.