FabrikFabrik

Services

What each of the 7 Fabrik services does, how they depend on each other, and how to tell when one is unhealthy.

The Fabrik stack is 7 long-running containers. They share one Docker image where possible, one Docker network, and one .env file. This page walks each service in order of dependency — the ones lower in the chain have to be up before the ones above start cleanly.

Dependency graph

Compose uses depends_on: condition: service_healthy, which means the Python services won't even try to start until Postgres reports pg_isready, Redis replies to PING, and Neo4j responds on port 7474. If one data service is slow to come up, the stack waits — it doesn't fail.

Data services

postgres

Image: postgres:17-alpine · Volume: fabrik_postgres_data

Primary relational store for everything non-graph: users, groups, saved queries, scheduled tasks, AWX templates, audit logs, time machine snapshot metadata. Exposed on 127.0.0.1:5432 so the host can inspect it — not the public network.

Healthcheck: pg_isready -U fabrik every 10 s. If this ever reports unhealthy, every Python service will follow within a minute.

Backup: pg_dump against the container (see Upgrading and backup).

neo4j

Image: neo4j:5.26 · Volumes: fabrik_neo4j_data, fabrik_neo4j_logs

Graph database storing the ACI MIM — class hierarchy, containment rules, property definitions. Populated by the backend on first boot from the MIM registry matching APIC_VERSION, or by explicit MIM imports triggered from the admin UI.

Heap and page cache: NEO4J_HEAP_MAX_SIZE (default 1 GB) and NEO4J_PAGECACHE_SIZE (default 256 MB) together dictate Neo4j's RSS. Keep their sum comfortably below whatever memory you allot the container, leaving headroom for other JVM needs.

Healthcheck: HTTP check against http://localhost:7474 with a 30 s startup grace period. Neo4j takes the longest to warm up; that's normal.

redis

Image: redis:8-alpine · Volume: fabrik_redis_data

Three overlapping roles:

  1. Celery broker (/0) — task queue for backend → worker dispatch.
  2. Celery result backend (/1) — short-lived result storage.
  3. Django Channels layer — WebSocket group membership and message routing.
  4. MIM cache — short-TTL responses from Neo4j (see the cache tiers in backend/mim/cache.py).

No authentication by default — Redis is only reachable from other containers on fabrik-network.

Healthcheck: redis-cli ping.

Application services

backend

Image: fabrik-backend:latest (built from backend/Dockerfile)

Django 6 running under Daphne ASGI. Handles every HTTP request and every WebSocket connection. The container entrypoint runs migratebootstrap_mimdaphne, so every restart:

  1. Applies pending Django migrations (idempotent).
  2. Seeds Neo4j with the MIM matching APIC_VERSION if the graph is empty.
  3. Starts the ASGI server.

Exposed port: ${BACKEND_PORT:-8000} on the host. In production, put nginx in front and keep this bound to 127.0.0.1.

Health: Reachable on GET /api/health/ — returns {"status": "ok"} plus version info. Scrape-friendly.

AWX status updates flow in through two channels: a webhook receiver at POST /api/awx/webhooks/receiver/ (HMAC-validated, calls JobMonitor directly) and a Celery beat task that polls AWX every 30 seconds. Whichever arrives first updates the execution row and emits WebSocket progress to the frontend.

celery-worker

Image: fabrik-backend:latest (same image as backend)

The workhorse. One container runs a single Celery process with configurable concurrency (CELERY_WORKER_CONCURRENCY, default 2). It subscribes to seven queues:

celery, query_exec, scheduled, awx_monitor, awx_exec, maintenance, mim_import

Tasks route to a queue based on @shared_task(queue=...) in the code — you don't configure routing in .env. Scale by running more worker containers rather than raising concurrency beyond 4 (Python GIL trade-offs).

celery-beat

Image: fabrik-backend:latest

Scheduler. One process, one replica — running two would double-fire every scheduled task. Reads the schedule from the database (django_celery_beat tables) and emits tasks onto Redis for workers to pick up.

Beat has no healthcheck beyond process liveness; watch its logs for Scheduler: Sending due task... lines when you expect a job to fire.

frontend

Image: fabrik-frontend:latest

A multi-stage build: stage 1 runs vite build, stage 2 is nginx:alpine serving the static SPA from /usr/share/nginx/html and reverse-proxying /api/, /admin/, /static/, and /ws/ to the backend container on port 8000. nginx listens on port 80 inside the container; map it to the host with FRONTEND_PORT (default 80).

Because all browser traffic terminates here, you usually don't expose the backend port externally. The frontend is the only public entry point; the backend is reachable only through the nginx proxy on the same container network.

Reading the logs

Every service logs JSON lines to the Docker daemon with a 10 MB × 3 file rotation. Common patterns:

# Everything, followed
docker compose logs -f

# Just one service
docker compose logs -f backend

# Last 100 lines from workers and beat
docker compose logs --tail=100 celery-worker celery-beat

Scaling guidance

  • More users / more queries running concurrently → scale celery-worker (set docker compose up --scale celery-worker=3) or raise CELERY_WORKER_CONCURRENCY.
  • Big MIM / many APIC versions → raise NEO4J_HEAP_MAX_SIZE and NEO4J_PAGECACHE_SIZE, grow the container memory limit to match.
  • Heavy audit log traffic → Postgres. Consider a managed Postgres and point DATABASE_URL at it.

Always docker stats first, scale second.