We had an outage, Lemdit fell over while I was asleep so bad timing. It looks like it was down for about 4 hours.

I’ll look into what caused it, I have a script that tries to automatically recover Lemdit from the usual crash, but something else happened here.

Anyway if you tried to access it and couldn’t - sorry! It’s back now.

Edit:

I believe this was caused by cache depleting all available RAM (impressive considering we’ve got 128 GB allocated). This isn’t normally supposed to cause an issue as cache is meant to be cleared to make room for app usage, but in practice it can be problematic and it’s likely what got everything to fall over.

I’ve got a cron job in place that will clear cache daily now so this won’t happen again.

Here’s a graph if you’re curious, the outage occurred ~3:30AM, the drop you see is me restarting the VM:

    • delendum@lemdit.comOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 months ago

      Of course, it’s called Munin - http://munin-monitoring.org/

      It has very useful capability out of the box and it takes plugins to monitor just about anything else. On Debian it’s available through the repository, install munin for the server, and munin-node on the machines you want to monitor (they can both be on the same machine). Config is relatively easy too.

      I use a raspberry pi as my Munin server, it’s very low on resources so it works great.