UniFi OS Server Health Monitoring Script Software

Worried your UniFi OS Server might run out of RAM or Disk Space or have other health issues?

Use this script and get alerts via PushOver (Don’t forget to add your API Keys for PushOver)

Please adjust the CPU, DISK Space, RAM % and RAM Trigger Values .

Note from testing it appears that the UOS is using around 80 to 85% RAM under normal use. So reboot triggers (if set) should be over 98%.

Optionally set Reboot to ‘true’ or ‘false’ to force restart due to memory issues.

Follow this list of instructions below. Basically adding the script and then setting a cron to run it.

sudo apt update
sudo apt install curl
sudo apt install bc
sudo nano /usr/local/bin/check_unifi_resources.sh
*PASTE IN BASH SCRIPT FROM THE MAIN CODE BLOCK AND SAVE*
sudo chmod +x /usr/local/bin/check_unifi_resources.sh
*TEST WITH LINE BELOW - THERE MAY BE NO OUTPUT*
/usr/local/bin/check_unifi_resources.sh
*NOW CHECK THE LOG*
grep "Resource Monitor" /var/log/syslog
crontab -e
*ADD LINE BELOW TO END OF CRON USING NANO EDITOR AND SAVE*
*/5 * * * * /usr/local/bin/check_unifi_resources.sh > /dev/null 2>&1

Below is the main code you will paste into the check_unifi_rescources.sh on the 4 above.

#!/bin/bash
#
# A script to monitor RAM, Disk, and CPU space and send a Pushover alert
# Includes optional auto-reboot on critical RAM usage.
# Compliments of ITProExpert.com & Ravtic.com
# Refined for Ubuntu 24.04/UniFi OS

# --- Configuration ---

# 1. Pushover Credentials
APP_TOKEN="****CHANGE TO YOUR OWN APP TOKEN IN PUSHOVER******"
USER_KEY="****CHANGE TO YOUR OWN USER TOKEN IN PUSHOVER******"

# Pushover Priority & Sound
# Priority: -2 (Lowest), -1 (Low), 0 (Normal), 1 (High)
PUSHOVER_PRIORITY="0"
# Sounds: pushover, bike, bugle, cashregister, classical, cosmic, falling, gamelan, incoming, intermission, magic, mechanical, pianobar, siren, spacealarm, tugboat, alien, climb, persistent, echo, updown, none
PUSHOVER_SOUND="pushover"

# 2. File Paths
MONITORED_PATH="/"

# 3. Alerting Thresholds

# RAM: Alert if AVAILABLE RAM is below this MB (Absolute check)
MIN_RAM_MB=1200  

# RAM: Alert if % USED RAM is above this % (Percentage check)
MAX_MEM_PERCENT=96

# CPU: Alert if 5-minute Load Average exceeds this % (relative to core count)
MAX_CPU_PERCENT=30

# Disk: Alert if FREE DISK is below this GB
MIN_DISK_GB=5   

# 4. Reboot Configuration
# Set to "true" to enable auto-reboot on critical RAM usage
ENABLE_AUTO_REBOOT="false" 

# Reboot if RAM % exceeds this value
REBOOT_RAM_PERCENT=98

# Safety: Minimum uptime (in seconds) before allowing a reboot (900s = 15 mins)
# This prevents infinite reboot loops if RAM is high immediately on boot.
MIN_UPTIME_SECONDS=900

# --- Do not edit below this line ---
PATH=/usr/bin:/bin:/usr/sbin:/sbin
LOCK_FILE="/tmp/unifi_alert_lock"
ALERT_COOLDOWN=3600 # 1 hour in seconds

# --- Sanity Checks ---

if [[ "$APP_TOKEN" == *"CHANGE TO"* ]] || [[ "$USER_KEY" == *"CHANGE TO"* ]]; then
    echo "Error: APP_TOKEN or USER_KEY is not set."
    logger "Resource Monitor: Config error - Tokens not set."
    exit 1
fi

if ! command -v bc &> /dev/null; then
    echo "Error: 'bc' is not installed. Run 'sudo apt install bc'."
    exit 1
fi

# --- Data Gathering ---

# 1. RAM Stats
MEM_STATS=$(free -m | grep Mem:)
AVAILABLE_RAM_MB=$(echo "$MEM_STATS" | awk '{print $7}')
TOTAL_RAM_MB=$(echo "$MEM_STATS" | awk '{print $2}')
MEM_USED_PERCENT=$(echo "scale=2; ($TOTAL_RAM_MB - $AVAILABLE_RAM_MB) * 100 / $TOTAL_RAM_MB" | bc | cut -d. -f1)

# 2. Disk Stats
FREE_DISK_GB=$(df -BG "$MONITORED_PATH" | tail -n 1 | awk '{print $4}' | sed 's/G//')

# 3. CPU Stats (5-minute load avg normalized to core count)
CORE_COUNT=$(nproc)
LOAD_AVG_5MIN=$(awk '{print $2}' /proc/loadavg)
# Calculate percentage: (Load / Cores) * 100
CPU_LOAD_PERCENT=$(echo "scale=2; ($LOAD_AVG_5MIN / $CORE_COUNT) * 100" | bc | cut -d. -f1)

# 4. Top 3 Processes by Memory
# Format: "ProcessName (UsedMB, Used%)"
# We fetch 'rss' (KB) and divide by 1024 in awk to get MB
TOP_PROCESSES=$(ps -eo comm,rss,%mem --sort=-%mem | head -n 4 | tail -n 3 | awk '{printf "%s (%.0fMB, %s%%)\n", $1, $2/1024, $3}' | tr '\n' ', ' | sed 's/, $//')

# --- Logic & Message Building ---

PROBLEM_DETECTED=0
REBOOT_TRIGGERED=0
ALERT_TITLE="Server Resource Alert"
MESSAGE="Status Report:\n"

# Check 1: RAM
if [ "$AVAILABLE_RAM_MB" -lt "$MIN_RAM_MB" ] || [ "$MEM_USED_PERCENT" -ge "$MAX_MEM_PERCENT" ]; then
    PROBLEM_DETECTED=1
    MESSAGE+="[!] MEMORY ISSUE: ${MEM_USED_PERCENT}% Used (Free: ${AVAILABLE_RAM_MB}MB)\n"
else
    MESSAGE+="[OK] Memory: ${MEM_USED_PERCENT}% Used\n"
fi

# Check 2: CPU
if [ "$CPU_LOAD_PERCENT" -ge "$MAX_CPU_PERCENT" ]; then
    PROBLEM_DETECTED=1
    MESSAGE+="[!] HIGH CPU (5m avg): ${CPU_LOAD_PERCENT}% (Threshold: ${MAX_CPU_PERCENT}%)\n"
else
    MESSAGE+="[OK] CPU Load: ${CPU_LOAD_PERCENT}%\n"
fi

# Check 3: Disk
if [ "$FREE_DISK_GB" -lt "$MIN_DISK_GB" ]; then
    PROBLEM_DETECTED=1
    MESSAGE+="[!] LOW DISK: ${FREE_DISK_GB}GB Free\n"
else
    MESSAGE+="[OK] Disk Space: ${FREE_DISK_GB}GB Free\n"
fi

# Append Process Data
MESSAGE+="\nTop RAM Hogs: ${TOP_PROCESSES}"

# --- Reboot Logic ---

if [ "$ENABLE_AUTO_REBOOT" == "true" ] && [ "$MEM_USED_PERCENT" -ge "$REBOOT_RAM_PERCENT" ]; then
    # Check Uptime Safety
    CURRENT_UPTIME=$(awk '{print $1}' /proc/uptime | cut -d. -f1)
    
    if [ "$CURRENT_UPTIME" -gt "$MIN_UPTIME_SECONDS" ]; then
        REBOOT_TRIGGERED=1
        PROBLEM_DETECTED=1
        ALERT_TITLE="CRITICAL: SERVER REBOOTING"
        MESSAGE="CRITICAL: RAM reached ${MEM_USED_PERCENT}%. System is rebooting now.\n\n${MESSAGE}"
    else
        MESSAGE+="\n(Reboot threshold met, but suppressed due to recent boot)"
    fi
fi

# --- Notification Sending ---

should_alert=0

# Determine if we should alert
if [ "$PROBLEM_DETECTED" -eq 1 ]; then
    # Always alert immediately if rebooting
    if [ "$REBOOT_TRIGGERED" -eq 1 ]; then
        should_alert=1
    elif [ -f "$LOCK_FILE" ]; then
        # Check Cooldown for standard alerts
        FILE_AGE=$(($(date +%s) - $(stat -c %Y "$LOCK_FILE")))
        if [ "$FILE_AGE" -gt "$ALERT_COOLDOWN" ]; then
            should_alert=1
        else
            logger "Resource Monitor: Issue detected, but in cooldown."
        fi
    else
        should_alert=1
    fi
fi

if [ "$should_alert" -eq 1 ]; then
    logger "Resource Monitor: Sending alert. Reboot: $REBOOT_TRIGGERED"
    
    curl -s --form-string "token=$APP_TOKEN" \
        --form-string "user=$USER_KEY" \
        --form-string "message=$MESSAGE" \
        --form-string "title=$ALERT_TITLE" \
        --form-string "priority=$PUSHOVER_PRIORITY" \
        --form-string "sound=$PUSHOVER_SOUND" \
        https://api.pushover.net/1/messages.json
    
    # Touch lock file to reset cooldown
    touch "$LOCK_FILE"

    # Perform Reboot if triggered
    if [ "$REBOOT_TRIGGERED" -eq 1 ]; then
        logger "Resource Monitor: Initiating Reboot Sequence."
        sync
        /sbin/reboot
    fi

elif [ "$PROBLEM_DETECTED" -eq 0 ] && [ -f "$LOCK_FILE" ]; then
    # Clear lock file if everything is back to normal
    rm "$LOCK_FILE"
fi

Note: Make sure you have an offline UniFi Server Settings backup just in case of any failures! No liability offered – Script should work perfectly.

Similar Posts

Leave a Reply