Aztec Ops Toolkit
Node briefing
Aztec Ops Toolkit
Production-grade monitoring, alerting, and backup automation for Aztec node operators. Keep your infrastructure healthy 24/7.
NetworkAztec Mainnet
Setup time1-2 hours
Difficultyđĸ Easy
System checklist
- Dependencies
- curl, jq, cron
- Telegram
- Bot token + Chat ID
- Disk
- ~100MB for logs/backups
Launch prerequisites
- Running Aztec node (Sequencer, Prover, or Slasher)
- Basic shell scripting knowledge
- Telegram account for alerts
Key features
- Health check watchdog with auto-restart
- Telegram alerts for critical events
- Automated keystore backups
- Disk space monitoring
- Log rotation and cleanup
Overviewâ
Running a node is one thing. Keeping it healthy 24/7 is another. This toolkit provides battle-tested scripts for:
| Component | Purpose |
|---|---|
| Watchdog | Monitor node health, auto-restart on failure |
| Telegram Alerts | Instant notifications for critical events |
| Backups | Automated keystore and config backups |
| Disk Monitor | Alert before you run out of space |
| Log Rotation | Prevent logs from eating your disk |
1. Telegram Bot Setupâ
First, create a Telegram bot to receive alerts.
Create Botâ
- Open Telegram, search for
@BotFather - Send
/newbot - Follow prompts to name your bot
- Save the API Token (looks like
123456789:ABCdefGHI...)
Get Chat IDâ
- Start a chat with your new bot
- Send any message to it
- Run this command:
curl -s "https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates" | jq '.result[0].message.chat.id'
- Save the Chat ID (a number like
123456789)
Create Config Fileâ
mkdir -p ~/aztec-ops
cat > ~/aztec-ops/.env << 'EOF'
# Telegram Configuration
TELEGRAM_BOT_TOKEN="YOUR_BOT_TOKEN_HERE"
TELEGRAM_CHAT_ID="YOUR_CHAT_ID_HERE"
# Node Configuration
NODE_NAME="aztec-sequencer-01"
NODE_CONTAINER="aztec-sequencer" # Docker container name
# Thresholds
DISK_ALERT_THRESHOLD=85 # Alert when disk usage > 85%
MEMORY_ALERT_THRESHOLD=90 # Alert when memory > 90%
# Backup Configuration
BACKUP_DIR="/root/aztec-backups"
KEYSTORE_PATH="/root/.aztec/keystore"
EOF
chmod 600 ~/aztec-ops/.env
2. Telegram Alert Functionâ
Create a reusable alert function:
cat > ~/aztec-ops/telegram-alert.sh << 'EOF'
#!/bin/bash
# Telegram Alert Script for Aztec Nodes
source ~/aztec-ops/.env
send_alert() {
local level="$1" # INFO, WARN, ERROR, CRITICAL
local message="$2"
local emoji=""
case "$level" in
INFO) emoji="âšī¸" ;;
WARN) emoji="â ī¸" ;;
ERROR) emoji="â" ;;
CRITICAL) emoji="đ¨" ;;
SUCCESS) emoji="â
" ;;
*) emoji="đĸ" ;;
esac
local timestamp=$(date '+%Y-%m-%d %H:%M:%S UTC')
local formatted_message="${emoji} *${level}* | ${NODE_NAME}
${message}
_${timestamp}_"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d chat_id="${TELEGRAM_CHAT_ID}" \
-d text="${formatted_message}" \
-d parse_mode="Markdown" \
> /dev/null 2>&1
}
# Allow script to be sourced or run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
# Direct execution - send test message
send_alert "INFO" "Test alert from Aztec Ops Toolkit"
echo "Test alert sent!"
fi
EOF
chmod +x ~/aztec-ops/telegram-alert.sh
Test Itâ
~/aztec-ops/telegram-alert.sh
You should receive a test message on Telegram.
3. Node Health Watchdogâ
This script monitors your node and restarts it if unhealthy.
cat > ~/aztec-ops/watchdog.sh << 'EOF'
#!/bin/bash
# Aztec Node Watchdog - Health check and auto-restart
source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh
LOG_FILE="/var/log/aztec-watchdog.log"
MAX_RESTART_ATTEMPTS=3
RESTART_COOLDOWN=300 # 5 minutes between restart attempts
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
check_container_running() {
docker ps --format '{{.Names}}' | grep -q "^${NODE_CONTAINER}$"
}
check_container_healthy() {
# Check if container is responding (adjust endpoint as needed)
local health_check=$(docker exec "$NODE_CONTAINER" curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health 2>/dev/null)
[[ "$health_check" == "200" ]]
}
check_sync_status() {
# Check if node is syncing (adjust based on your node type)
local sync_status=$(docker logs "$NODE_CONTAINER" --tail 10 2>&1 | grep -c "syncing\|processing")
[[ "$sync_status" -gt 0 ]]
}
restart_container() {
log "Attempting to restart $NODE_CONTAINER..."
docker restart "$NODE_CONTAINER"
sleep 30 # Wait for container to start
if check_container_running; then
log "Container restarted successfully"
send_alert "SUCCESS" "Node restarted successfully after health check failure"
return 0
else
log "Container failed to restart"
return 1
fi
}
# Main watchdog logic
main() {
local restart_attempts=0
# Check 1: Is container running?
if ! check_container_running; then
log "CRITICAL: Container $NODE_CONTAINER is not running!"
send_alert "CRITICAL" "Container is DOWN! Attempting restart..."
docker start "$NODE_CONTAINER" 2>/dev/null || docker-compose -f ~/aztec/docker-compose.yml up -d
sleep 30
if check_container_running; then
send_alert "SUCCESS" "Container started successfully"
else
send_alert "CRITICAL" "Failed to start container! Manual intervention required."
fi
exit 1
fi
# Check 2: Is container healthy?
if ! check_container_healthy; then
log "WARNING: Container health check failed"
# Give it another chance
sleep 10
if ! check_container_healthy; then
send_alert "WARN" "Health check failing - monitoring closely"
# Check if this is persistent
sleep 60
if ! check_container_healthy; then
send_alert "ERROR" "Persistent health check failure - restarting"
restart_container
fi
fi
fi
# Check 3: Memory usage
local mem_usage=$(docker stats "$NODE_CONTAINER" --no-stream --format "{{.MemPerc}}" 2>/dev/null | tr -d '%')
if [[ -n "$mem_usage" ]] && (( $(echo "$mem_usage > $MEMORY_ALERT_THRESHOLD" | bc -l) )); then
log "WARNING: High memory usage: ${mem_usage}%"
send_alert "WARN" "High memory usage: ${mem_usage}%"
fi
log "Health check passed"
}
main "$@"
EOF
chmod +x ~/aztec-ops/watchdog.sh
4. Disk Space Monitorâ
cat > ~/aztec-ops/disk-monitor.sh << 'EOF'
#!/bin/bash
# Disk Space Monitor for Aztec Nodes
source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh
check_disk_space() {
local mount_point="${1:-/}"
local usage=$(df "$mount_point" | awk 'NR==2 {print $5}' | tr -d '%')
local available=$(df -h "$mount_point" | awk 'NR==2 {print $4}')
if [[ "$usage" -ge "$DISK_ALERT_THRESHOLD" ]]; then
send_alert "WARN" "Disk space warning!
Mount: $mount_point
Usage: ${usage}%
Available: ${available}
Consider:
âĸ Pruning Docker images: \`docker system prune -a\`
âĸ Rotating logs
âĸ Expanding storage"
return 1
fi
echo "Disk OK: ${usage}% used, ${available} available"
return 0
}
# Check main partitions
check_disk_space "/"
check_disk_space "/var/lib/docker" 2>/dev/null || true
# Check for large log files
large_logs=$(find /var/log -type f -size +500M 2>/dev/null)
if [[ -n "$large_logs" ]]; then
send_alert "INFO" "Large log files detected:
$large_logs
Consider log rotation."
fi
EOF
chmod +x ~/aztec-ops/disk-monitor.sh
5. Automated Backupsâ
cat > ~/aztec-ops/backup.sh << 'EOF'
#!/bin/bash
# Aztec Keystore and Config Backup Script
source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_NAME="aztec-backup-${TIMESTAMP}"
mkdir -p "$BACKUP_DIR"
backup_keystore() {
if [[ -d "$KEYSTORE_PATH" ]]; then
echo "Backing up keystore..."
tar -czf "${BACKUP_DIR}/${BACKUP_NAME}-keystore.tar.gz" -C "$(dirname $KEYSTORE_PATH)" "$(basename $KEYSTORE_PATH)"
if [[ $? -eq 0 ]]; then
echo "Keystore backup created: ${BACKUP_NAME}-keystore.tar.gz"
return 0
else
send_alert "ERROR" "Keystore backup failed!"
return 1
fi
else
echo "Keystore path not found: $KEYSTORE_PATH"
return 1
fi
}
backup_configs() {
echo "Backing up configurations..."
local config_files=(
~/aztec/docker-compose.yml
~/aztec/.env
~/aztec-ops/.env
)
local temp_dir=$(mktemp -d)
for file in "${config_files[@]}"; do
if [[ -f "$file" ]]; then
cp "$file" "$temp_dir/"
fi
done
tar -czf "${BACKUP_DIR}/${BACKUP_NAME}-configs.tar.gz" -C "$temp_dir" .
rm -rf "$temp_dir"
echo "Config backup created: ${BACKUP_NAME}-configs.tar.gz"
}
cleanup_old_backups() {
echo "Cleaning up backups older than 7 days..."
find "$BACKUP_DIR" -name "aztec-backup-*" -mtime +7 -delete
}
# Main
echo "Starting Aztec backup..."
backup_keystore
backup_configs
cleanup_old_backups
# Calculate backup size
backup_size=$(du -sh "$BACKUP_DIR" | cut -f1)
backup_count=$(ls -1 "$BACKUP_DIR" | wc -l)
send_alert "SUCCESS" "Backup completed successfully
Files: ${backup_count}
Total size: ${backup_size}
Location: ${BACKUP_DIR}"
echo "Backup complete!"
EOF
chmod +x ~/aztec-ops/backup.sh
6. Log Rotationâ
cat > ~/aztec-ops/log-rotate.sh << 'EOF'
#!/bin/bash
# Log Rotation for Aztec Nodes
# Rotate Docker logs
docker_log_size=$(docker system df --format '{{.Size}}' 2>/dev/null | head -1)
echo "Docker disk usage: $docker_log_size"
# Truncate container logs older than 7 days
for container in $(docker ps -q); do
log_file=$(docker inspect --format='{{.LogPath}}' "$container" 2>/dev/null)
if [[ -f "$log_file" ]] && [[ $(stat -c%s "$log_file" 2>/dev/null) -gt 104857600 ]]; then
echo "Truncating log for container: $(docker inspect --format='{{.Name}}' $container)"
truncate -s 0 "$log_file"
fi
done
# Clean up old Docker resources
echo "Cleaning up unused Docker resources..."
docker system prune -f --volumes 2>/dev/null
echo "Log rotation complete"
EOF
chmod +x ~/aztec-ops/log-rotate.sh
7. Setup Cron Jobsâ
Automate everything with cron:
# Open crontab editor
crontab -e
Add these lines:
# Aztec Ops Toolkit - Automated Tasks
# ===================================
# Watchdog - Every 5 minutes
*/5 * * * * /root/aztec-ops/watchdog.sh >> /var/log/aztec-watchdog.log 2>&1
# Disk monitor - Every hour
0 * * * * /root/aztec-ops/disk-monitor.sh >> /var/log/aztec-ops.log 2>&1
# Backup - Daily at 3 AM
0 3 * * * /root/aztec-ops/backup.sh >> /var/log/aztec-backup.log 2>&1
# Log rotation - Weekly on Sunday at 4 AM
0 4 * * 0 /root/aztec-ops/log-rotate.sh >> /var/log/aztec-ops.log 2>&1
# Daily health report - Every day at 9 AM
0 9 * * * /root/aztec-ops/daily-report.sh >> /var/log/aztec-ops.log 2>&1
8. Daily Health Reportâ
cat > ~/aztec-ops/daily-report.sh << 'EOF'
#!/bin/bash
# Daily Health Report for Aztec Nodes
source ~/aztec-ops/.env
source ~/aztec-ops/telegram-alert.sh
# Gather metrics
uptime_info=$(uptime -p)
disk_usage=$(df -h / | awk 'NR==2 {print $5}')
memory_info=$(free -h | awk 'NR==2 {printf "Used: %s / Total: %s", $3, $2}')
container_status=$(docker ps --filter "name=${NODE_CONTAINER}" --format "{{.Status}}" 2>/dev/null || echo "Unknown")
# Docker stats
if docker ps -q --filter "name=${NODE_CONTAINER}" | grep -q .; then
docker_stats=$(docker stats "${NODE_CONTAINER}" --no-stream --format "CPU: {{.CPUPerc}} | Mem: {{.MemUsage}}" 2>/dev/null)
else
docker_stats="Container not running"
fi
# Check last backup
last_backup=$(ls -t ${BACKUP_DIR}/aztec-backup-* 2>/dev/null | head -1)
if [[ -n "$last_backup" ]]; then
backup_date=$(stat -c %y "$last_backup" | cut -d'.' -f1)
backup_info="Last: $backup_date"
else
backup_info="No backups found"
fi
# Build report
report="đ *Daily Health Report*
đĨī¸ *System*
âĸ Uptime: ${uptime_info}
âĸ Disk: ${disk_usage} used
âĸ Memory: ${memory_info}
đŗ *Container: ${NODE_CONTAINER}*
âĸ Status: ${container_status}
âĸ ${docker_stats}
đž *Backups*
âĸ ${backup_info}
---
_All systems operational_"
send_alert "INFO" "$report"
EOF
chmod +x ~/aztec-ops/daily-report.sh
9. Quick Commands Referenceâ
Create a helper script with common commands:
cat > ~/aztec-ops/aztec-ops << 'EOF'
#!/bin/bash
# Aztec Ops - Quick command wrapper
case "$1" in
status)
docker ps --filter "name=aztec"
;;
logs)
docker logs -f --tail 100 "${2:-aztec-sequencer}"
;;
restart)
docker restart "${2:-aztec-sequencer}"
;;
backup)
~/aztec-ops/backup.sh
;;
alert)
source ~/aztec-ops/telegram-alert.sh
send_alert "${2:-INFO}" "${3:-Manual alert}"
;;
report)
~/aztec-ops/daily-report.sh
;;
disk)
~/aztec-ops/disk-monitor.sh
;;
*)
echo "Aztec Ops Toolkit"
echo ""
echo "Usage: aztec-ops <command> [args]"
echo ""
echo "Commands:"
echo " status Show running Aztec containers"
echo " logs [name] Follow container logs"
echo " restart [name] Restart container"
echo " backup Run backup now"
echo " alert [level] [msg] Send Telegram alert"
echo " report Send daily health report"
echo " disk Check disk space"
;;
esac
EOF
chmod +x ~/aztec-ops/aztec-ops
sudo ln -sf ~/aztec-ops/aztec-ops /usr/local/bin/aztec-ops
Now you can use:
aztec-ops status
aztec-ops logs
aztec-ops backup
aztec-ops report
Verification Checklistâ
After setup, verify everything works:
# 1. Test Telegram alerts
~/aztec-ops/telegram-alert.sh
# 2. Run watchdog manually
~/aztec-ops/watchdog.sh
# 3. Check disk monitor
~/aztec-ops/disk-monitor.sh
# 4. Test backup
~/aztec-ops/backup.sh
# 5. Send daily report
~/aztec-ops/daily-report.sh
# 6. Verify cron jobs are scheduled
crontab -l | grep aztec
Troubleshootingâ
Telegram alerts not workingâ
# Test API connection
curl -s "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getMe"
# Check chat ID is correct
curl -s "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getUpdates" | jq
Watchdog not restarting containerâ
# Check Docker permissions
docker ps
# Verify container name in .env matches actual container
docker ps --format '{{.Names}}'
Backups failingâ
# Check backup directory permissions
ls -la ~/aztec-backups/
# Verify keystore path exists
ls -la ~/.aztec/keystore/
Next Stepsâ
- Grafana Dashboard: Set up Monitoring Stack for visual metrics
- PagerDuty Integration: Upgrade alerts for on-call rotation
- Remote Backups: Sync backups to S3 or external storage
Your Aztec infrastructure is now production-ready with automated monitoring, alerting, and backups.
