Aller au contenu principal

Ops Subdomains Runbook

This runbook documents production access for operations services:

  • https://queues.aaperture.com
  • https://grafana.aaperture.com
  • https://prometheus.aaperture.com

Goals

  • Keep ops UIs outside the main app routes.
  • Protect access with authentication.
  • Make TLS/DNS troubleshooting quick and repeatable.

Nginx Files

  • Main app vhost: infra/nginx-aaperture.conf
  • Ops subdomains vhost: infra/nginx-ops-subdomains.conf

Install both files and ensure symlink names match your include pattern in nginx.conf:

  • If includes are sites-enabled/*.com, symlinks must end with .com.
  • If includes are sites-enabled/*.conf, symlinks must end with .conf.

Required DNS

Create A records pointing to the production server IP:

  • queues.aaperture.com
  • grafana.aaperture.com
  • prometheus.aaperture.com

TLS Certificates

Issue certs after DNS propagation (example with certbot/nginx plugin):

sudo certbot --nginx -d queues.aaperture.com
sudo certbot --nginx -d grafana.aaperture.com
sudo certbot --nginx -d prometheus.aaperture.com

Security

  • Bull Board auth in API:
    • QUEUE_BOARD_BASIC_AUTH_ENABLED=true
    • QUEUE_BOARD_BASIC_AUTH_USER
    • QUEUE_BOARD_BASIC_AUTH_PASSWORD
  • Prometheus can also be protected at Nginx level via auth_basic.
  • Grafana auth is managed by Grafana admin credentials:
    • GRAFANA_ADMIN_USER
    • GRAFANA_ADMIN_PASSWORD

Admin Ops Checks (Application)

Admin users can validate ops health and alerting directly from the app:

  • GET /ops/status: checks internal reachability of queues/grafana/prometheus and returns up/down.
  • POST /observability/alerts/test: sends a synthetic alert to configured Slack/email channels.

UI location:

  • Admin panel → Server Status tab → "Ops Status" card.

Grafana Persistence Permissions

infra/docker-compose.prod.yml mounts ${APP_DIR}/.data/grafana to /var/lib/grafana. This host folder must be writable by UID/GID 472:

sudo mkdir -p /var/www/aaperture/.data/grafana
sudo chown -R 472:472 /var/www/aaperture/.data/grafana
sudo chmod -R u+rwX,g+rwX /var/www/aaperture/.data/grafana

Validation Commands

sudo nginx -t
sudo systemctl reload nginx

curl -I https://queues.aaperture.com
curl -I https://grafana.aaperture.com
curl -I https://prometheus.aaperture.com

Certificate/SAN check:

echo | openssl s_client -connect queues.aaperture.com:443 -servername queues.aaperture.com 2>/dev/null | openssl x509 -noout -subject -ext subjectAltName
echo | openssl s_client -connect grafana.aaperture.com:443 -servername grafana.aaperture.com 2>/dev/null | openssl x509 -noout -subject -ext subjectAltName
echo | openssl s_client -connect prometheus.aaperture.com:443 -servername prometheus.aaperture.com 2>/dev/null | openssl x509 -noout -subject -ext subjectAltName

Troubleshooting

ERR_CERT_COMMON_NAME_INVALID

  • Verify correct server_name blocks are loaded: sudo nginx -T | grep -n "server_name".
  • Verify include pattern (*.com vs *.conf) matches symlink names.
  • Recheck cert paths in each TLS server block.

502 on Grafana

  • Check container status/logs:
    • docker ps | grep grafana
    • docker logs --tail=200 <grafana-container>
  • Most common cause is /var/lib/grafana permission denied.

Certbot HTTP-01 challenge returns 404

  • Ensure port 80 block serves /.well-known/acme-challenge/ from a reachable webroot.
  • Confirm DNS points to the same server handling Nginx.