Skip to main content

Aztec Ops Toolkit

Node briefing

Aztec Ops Toolkit

Production-grade monitoring, alerting, and backup automation for Aztec node operators. Keep your infrastructure healthy 24/7.

NetworkAztec Mainnet
Setup time1-2 hours
DifficultyđŸŸĸ Easy

System checklist

Dependencies
curl, jq, cron
Telegram
Bot token + Chat ID
Disk
~100MB for logs/backups

Launch prerequisites

  • Running Aztec node (Sequencer, Prover, or Slasher)
  • Basic shell scripting knowledge
  • Telegram account for alerts

Key features

  • Health check watchdog with auto-restart
  • Telegram alerts for critical events
  • Automated keystore backups
  • Disk space monitoring
  • Log rotation and cleanup

Overview​

Running a node is one thing. Keeping it healthy 24/7 is another. This toolkit provides battle-tested scripts for:

ComponentPurpose
WatchdogMonitor node health, auto-restart on failure
Telegram AlertsInstant notifications for critical events
BackupsAutomated keystore and config backups
Disk MonitorAlert before you run out of space
Log RotationPrevent logs from eating your disk

1. Telegram Bot Setup​

First, create a Telegram bot to receive alerts.

Create Bot​

  1. Open Telegram, search for @BotFather
  2. Send /newbot
  3. Follow prompts to name your bot
  4. Save the API Token (looks like 123456789:ABCdefGHI...)

Get Chat ID​

  1. Start a chat with your new bot
  2. Send any message to it
  3. Run this command:
curl -s "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates" | jq '.result[0].message.chat.id'
  1. Save the Chat ID (a number like 123456789)

Create Config File​

mkdir -p ~/aztec-ops
cat > ~/aztec-ops/.env << 'EOF'
# Telegram Configuration
TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN_HERE"
TELEGRAM_CHAT_ID="YOUR_CHAT_ID_HERE"

# Node Configuration
NODE_NAME="aztec-sequencer-01"
NODE_CONTAINER="aztec-sequencer" # Docker container name

# Thresholds
DISK_ALERT_THRESHOLD=85 # Alert when disk usage > 85%
MEMORY_ALERT_THRESHOLD=90 # Alert when memory > 90%

# Backup Configuration
BACKUP_DIR="/root/aztec-backups"
KEYSTORE_PATH="/root/.aztec/keystore"
EOF
chmod 600 ~/aztec-ops/.env

2. Telegram Alert Function​

Create a reusable alert function:

cat > ~/aztec-ops/telegram-alert.sh << 'EOF'
#!/bin/bash
# Telegram Alert Script for Aztec Nodes

source ~/aztec-ops/.env

send_alert() {
local level="$1" # INFO, WARN, ERROR, CRITICAL
local message="$2"

local emoji=""
case "$level" in
INFO) emoji="â„šī¸" ;;
WARN) emoji="âš ī¸" ;;
ERROR) emoji="❌" ;;
CRITICAL) emoji="🚨" ;;
SUCCESS) emoji="✅" ;;
*) emoji="đŸ“ĸ" ;;
esac

local timestamp=$(date '+%Y-%m-%d %H:%M:%S UTC')
local formatted_message="${emoji} *${level}* | ${NODE_NAME}

${message}

_${timestamp}_"

curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d chat_id="${TELEGRAM_CHAT_ID}" \
-d text="${formatted_message}" \
-d parse_mode="Markdown" \
> /dev/null 2>&1
}

# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Direct execution - send test message
send_alert "INFO" "Test alert from Aztec Ops Toolkit"
echo "Test alert sent!"
fi
EOF
chmod +x ~/aztec-ops/telegram-alert.sh

Test It​

~/aztec-ops/telegram-alert.sh

You should receive a test message on Telegram.


3. Node Health Watchdog​

This script monitors your node and restarts it if unhealthy.

cat > ~/aztec-ops/watchdog.sh << 'EOF'
#!/bin/bash
# Aztec Node Watchdog - Health check and auto-restart

source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh

LOG_FILE="/var/log/aztec-watchdog.log"
MAX_RESTART_ATTEMPTS=3
RESTART_COOLDOWN=300 # 5 minutes between restart attempts

log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}

check_container_running() {
docker ps --format '{{.Names}}' | grep -q "^${NODE_CONTAINER}$"
}

check_container_healthy() {
# Check if container is responding (adjust endpoint as needed)
local health_check=$(docker exec "$NODE_CONTAINER" curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health 2>/dev/null)
[[ "$health_check" == "200" ]]
}

check_sync_status() {
# Check if node is syncing (adjust based on your node type)
local sync_status=$(docker logs "$NODE_CONTAINER" --tail 10 2>&1 | grep -c "syncing\|processing")
[[ "$sync_status" -gt 0 ]]
}

restart_container() {
log "Attempting to restart $NODE_CONTAINER..."
docker restart "$NODE_CONTAINER"
sleep 30 # Wait for container to start

if check_container_running; then
log "Container restarted successfully"
send_alert "SUCCESS" "Node restarted successfully after health check failure"
return 0
else
log "Container failed to restart"
return 1
fi
}

# Main watchdog logic
main() {
local restart_attempts=0

# Check 1: Is container running?
if ! check_container_running; then
log "CRITICAL: Container $NODE_CONTAINER is not running!"
send_alert "CRITICAL" "Container is DOWN! Attempting restart..."

docker start "$NODE_CONTAINER" 2>/dev/null || docker-compose -f ~/aztec/docker-compose.yml up -d
sleep 30

if check_container_running; then
send_alert "SUCCESS" "Container started successfully"
else
send_alert "CRITICAL" "Failed to start container! Manual intervention required."
fi
exit 1
fi

# Check 2: Is container healthy?
if ! check_container_healthy; then
log "WARNING: Container health check failed"

# Give it another chance
sleep 10
if ! check_container_healthy; then
send_alert "WARN" "Health check failing - monitoring closely"

# Check if this is persistent
sleep 60
if ! check_container_healthy; then
send_alert "ERROR" "Persistent health check failure - restarting"
restart_container
fi
fi
fi

# Check 3: Memory usage
local mem_usage=$(docker stats "$NODE_CONTAINER" --no-stream --format "{{.MemPerc}}" 2>/dev/null | tr -d '%')
if [[ -n "$mem_usage" ]] && (( $(echo "$mem_usage > $MEMORY_ALERT_THRESHOLD" | bc -l) )); then
log "WARNING: High memory usage: ${mem_usage}%"
send_alert "WARN" "High memory usage: ${mem_usage}%"
fi

log "Health check passed"
}

main "$@"
EOF
chmod +x ~/aztec-ops/watchdog.sh

4. Disk Space Monitor​

cat > ~/aztec-ops/disk-monitor.sh << 'EOF'
#!/bin/bash
# Disk Space Monitor for Aztec Nodes

source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh

check_disk_space() {
local mount_point="${1:-/}"
local usage=$(df "$mount_point" | awk 'NR==2 {print $5}' | tr -d '%')
local available=$(df -h "$mount_point" | awk 'NR==2 {print $4}')

if [[ "$usage" -ge "$DISK_ALERT_THRESHOLD" ]]; then
send_alert "WARN" "Disk space warning!

Mount: $mount_point
Usage: ${usage}%
Available: ${available}

Consider:
â€ĸ Pruning Docker images: \`docker system prune -a\`
â€ĸ Rotating logs
â€ĸ Expanding storage"
return 1
fi

echo "Disk OK: ${usage}% used, ${available} available"
return 0
}

# Check main partitions
check_disk_space "/"
check_disk_space "/var/lib/docker" 2>/dev/null || true

# Check for large log files
large_logs=$(find /var/log -type f -size +500M 2>/dev/null)
if [[ -n "$large_logs" ]]; then
send_alert "INFO" "Large log files detected:
$large_logs

Consider log rotation."
fi
EOF
chmod +x ~/aztec-ops/disk-monitor.sh

5. Automated Backups​

cat > ~/aztec-ops/backup.sh << 'EOF'
#!/bin/bash
# Aztec Keystore and Config Backup Script

source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_NAME="aztec-backup-${TIMESTAMP}"

mkdir -p "$BACKUP_DIR"

backup_keystore() {
if [[ -d "$KEYSTORE_PATH" ]]; then
echo "Backing up keystore..."
tar -czf "${BACKUP_DIR}/${BACKUP_NAME}-keystore.tar.gz" -C "$(dirname $KEYSTORE_PATH)" "$(basename $KEYSTORE_PATH)"

if [[ $? -eq 0 ]]; then
echo "Keystore backup created: ${BACKUP_NAME}-keystore.tar.gz"
return 0
else
send_alert "ERROR" "Keystore backup failed!"
return 1
fi
else
echo "Keystore path not found: $KEYSTORE_PATH"
return 1
fi
}

backup_configs() {
echo "Backing up configurations..."

local config_files=(
~/aztec/docker-compose.yml
~/aztec/.env
~/aztec-ops/.env
)

local temp_dir=$(mktemp -d)

for file in "${config_files[@]}"; do
if [[ -f "$file" ]]; then
cp "$file" "$temp_dir/"
fi
done

tar -czf "${BACKUP_DIR}/${BACKUP_NAME}-configs.tar.gz" -C "$temp_dir" .
rm -rf "$temp_dir"

echo "Config backup created: ${BACKUP_NAME}-configs.tar.gz"
}

cleanup_old_backups() {
echo "Cleaning up backups older than 7 days..."
find "$BACKUP_DIR" -name "aztec-backup-*" -mtime +7 -delete
}

# Main
echo "Starting Aztec backup..."

backup_keystore
backup_configs
cleanup_old_backups

# Calculate backup size
backup_size=$(du -sh "$BACKUP_DIR" | cut -f1)
backup_count=$(ls -1 "$BACKUP_DIR" | wc -l)

send_alert "SUCCESS" "Backup completed successfully

Files: ${backup_count}
Total size: ${backup_size}
Location: ${BACKUP_DIR}"

echo "Backup complete!"
EOF
chmod +x ~/aztec-ops/backup.sh

6. Log Rotation​

cat > ~/aztec-ops/log-rotate.sh << 'EOF'
#!/bin/bash
# Log Rotation for Aztec Nodes

# Rotate Docker logs
docker_log_size=$(docker system df --format '{{.Size}}' 2>/dev/null | head -1)
echo "Docker disk usage: $docker_log_size"

# Truncate container logs older than 7 days
for container in $(docker ps -q); do
log_file=$(docker inspect --format='{{.LogPath}}' "$container" 2>/dev/null)
if [[ -f "$log_file" ]] && [[ $(stat -c%s "$log_file" 2>/dev/null) -gt 104857600 ]]; then
echo "Truncating log for container: $(docker inspect --format='{{.Name}}' $container)"
truncate -s 0 "$log_file"
fi
done

# Clean up old Docker resources
echo "Cleaning up unused Docker resources..."
docker system prune -f --volumes 2>/dev/null

echo "Log rotation complete"
EOF
chmod +x ~/aztec-ops/log-rotate.sh

7. Setup Cron Jobs​

Automate everything with cron:

# Open crontab editor
crontab -e

Add these lines:

# Aztec Ops Toolkit - Automated Tasks
# ===================================

# Watchdog - Every 5 minutes
*/5 * * * * /root/aztec-ops/watchdog.sh >> /var/log/aztec-watchdog.log 2>&1

# Disk monitor - Every hour
0 * * * * /root/aztec-ops/disk-monitor.sh >> /var/log/aztec-ops.log 2>&1

# Backup - Daily at 3 AM
0 3 * * * /root/aztec-ops/backup.sh >> /var/log/aztec-backup.log 2>&1

# Log rotation - Weekly on Sunday at 4 AM
0 4 * * 0 /root/aztec-ops/log-rotate.sh >> /var/log/aztec-ops.log 2>&1

# Daily health report - Every day at 9 AM
0 9 * * * /root/aztec-ops/daily-report.sh >> /var/log/aztec-ops.log 2>&1

8. Daily Health Report​

cat > ~/aztec-ops/daily-report.sh << 'EOF'
#!/bin/bash
# Daily Health Report for Aztec Nodes

source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh

# Gather metrics
uptime_info=$(uptime -p)
disk_usage=$(df -h / | awk 'NR==2 {print $5}')
memory_info=$(free -h | awk 'NR==2 {printf "Used: %s / Total: %s", $3, $2}')
container_status=$(docker ps --filter "name=${NODE_CONTAINER}" --format "{{.Status}}" 2>/dev/null || echo "Unknown")

# Docker stats
if docker ps -q --filter "name=${NODE_CONTAINER}" | grep -q .; then
docker_stats=$(docker stats "${NODE_CONTAINER}" --no-stream --format "CPU: {{.CPUPerc}} | Mem: {{.MemUsage}}" 2>/dev/null)
else
docker_stats="Container not running"
fi

# Check last backup
last_backup=$(ls -t ${BACKUP_DIR}/aztec-backup-* 2>/dev/null | head -1)
if [[ -n "$last_backup" ]]; then
backup_date=$(stat -c %y "$last_backup" | cut -d'.' -f1)
backup_info="Last: $backup_date"
else
backup_info="No backups found"
fi

# Build report
report="📊 *Daily Health Report*

đŸ–Ĩī¸ *System*
â€ĸ Uptime: ${uptime_info}
â€ĸ Disk: ${disk_usage} used
â€ĸ Memory: ${memory_info}

đŸŗ *Container: ${NODE_CONTAINER}*
â€ĸ Status: ${container_status}
â€ĸ ${docker_stats}

💾 *Backups*
â€ĸ ${backup_info}

---
_All systems operational_"

send_alert "INFO" "$report"
EOF
chmod +x ~/aztec-ops/daily-report.sh

9. Quick Commands Reference​

Create a helper script with common commands:

cat > ~/aztec-ops/aztec-ops << 'EOF'
#!/bin/bash
# Aztec Ops - Quick command wrapper

case "$1" in
status)
docker ps --filter "name=aztec"
;;
logs)
docker logs -f --tail 100 "${2:-aztec-sequencer}"
;;
restart)
docker restart "${2:-aztec-sequencer}"
;;
backup)
~/aztec-ops/backup.sh
;;
alert)
source ~/aztec-ops/telegram-alert.sh
send_alert "${2:-INFO}" "${3:-Manual alert}"
;;
report)
~/aztec-ops/daily-report.sh
;;
disk)
~/aztec-ops/disk-monitor.sh
;;
*)
echo "Aztec Ops Toolkit"
echo ""
echo "Usage: aztec-ops <command> [args]"
echo ""
echo "Commands:"
echo " status Show running Aztec containers"
echo " logs [name] Follow container logs"
echo " restart [name] Restart container"
echo " backup Run backup now"
echo " alert [level] [msg] Send Telegram alert"
echo " report Send daily health report"
echo " disk Check disk space"
;;
esac
EOF
chmod +x ~/aztec-ops/aztec-ops
sudo ln -sf ~/aztec-ops/aztec-ops /usr/local/bin/aztec-ops

Now you can use:

aztec-ops status
aztec-ops logs
aztec-ops backup
aztec-ops report

Verification Checklist​

After setup, verify everything works:

# 1. Test Telegram alerts
~/aztec-ops/telegram-alert.sh

# 2. Run watchdog manually
~/aztec-ops/watchdog.sh

# 3. Check disk monitor
~/aztec-ops/disk-monitor.sh

# 4. Test backup
~/aztec-ops/backup.sh

# 5. Send daily report
~/aztec-ops/daily-report.sh

# 6. Verify cron jobs are scheduled
crontab -l | grep aztec

Troubleshooting​

Telegram alerts not working​

# Test API connection
curl -s "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getMe"

# Check chat ID is correct
curl -s "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getUpdates" | jq

Watchdog not restarting container​

# Check Docker permissions
docker ps

# Verify container name in .env matches actual container
docker ps --format '{{.Names}}'

Backups failing​

# Check backup directory permissions
ls -la ~/aztec-backups/

# Verify keystore path exists
ls -la ~/.aztec/keystore/

Next Steps​

  • Grafana Dashboard: Set up Monitoring Stack for visual metrics
  • PagerDuty Integration: Upgrade alerts for on-call rotation
  • Remote Backups: Sync backups to S3 or external storage

Your Aztec infrastructure is now production-ready with automated monitoring, alerting, and backups.

Š 2026 TokioStack. All rights reserved.
DMCA.com Protection Status