Services
What each of the 7 Fabrik services does, how they depend on each other, and how to tell when one is unhealthy.
The Fabrik stack is 7 long-running containers. They share one Docker image where possible, one Docker network, and one .env file. This page walks each service in order of dependency — the ones lower in the chain have to be up before the ones above start cleanly.
Dependency graph
Compose uses depends_on: condition: service_healthy, which means the Python services won't even try to start until Postgres reports pg_isready, Redis replies to PING, and Neo4j responds on port 7474. If one data service is slow to come up, the stack waits — it doesn't fail.
Data services
postgres
Image: postgres:17-alpine · Volume: fabrik_postgres_data
Primary relational store for everything non-graph: users, groups, saved queries, scheduled tasks, AWX templates, audit logs, time machine snapshot metadata. Exposed on 127.0.0.1:5432 so the host can inspect it — not the public network.
Healthcheck: pg_isready -U fabrik every 10 s. If this ever reports unhealthy, every Python service will follow within a minute.
Backup: pg_dump against the container (see Upgrading and backup).
neo4j
Image: neo4j:5.26 · Volumes: fabrik_neo4j_data, fabrik_neo4j_logs
Graph database storing the ACI MIM — class hierarchy, containment rules, property definitions. Populated by the backend on first boot from the MIM registry matching APIC_VERSION, or by explicit MIM imports triggered from the admin UI.
Heap and page cache: NEO4J_HEAP_MAX_SIZE (default 1 GB) and NEO4J_PAGECACHE_SIZE (default 256 MB) together dictate Neo4j's RSS. Keep their sum comfortably below whatever memory you allot the container, leaving headroom for other JVM needs.
Healthcheck: HTTP check against http://localhost:7474 with a 30 s startup grace period. Neo4j takes the longest to warm up; that's normal.
redis
Image: redis:8-alpine · Volume: fabrik_redis_data
Three overlapping roles:
- Celery broker (
/0) — task queue for backend → worker dispatch. - Celery result backend (
/1) — short-lived result storage. - Django Channels layer — WebSocket group membership and message routing.
- MIM cache — short-TTL responses from Neo4j (see the cache tiers in
backend/mim/cache.py).
No authentication by default — Redis is only reachable from other containers on fabrik-network.
Healthcheck: redis-cli ping.
Application services
backend
Image: fabrik-backend:latest (built from backend/Dockerfile)
Django 6 running under Daphne ASGI. Handles every HTTP request and every WebSocket connection. The container entrypoint runs migrate → bootstrap_mim → daphne, so every restart:
- Applies pending Django migrations (idempotent).
- Seeds Neo4j with the MIM matching
APIC_VERSIONif the graph is empty. - Starts the ASGI server.
Exposed port: ${BACKEND_PORT:-8000} on the host. In production, put nginx in front and keep this bound to 127.0.0.1.
Health: Reachable on GET /api/health/ — returns {"status": "ok"} plus version info. Scrape-friendly.
AWX status updates flow in through two channels: a webhook receiver at POST /api/awx/webhooks/receiver/ (HMAC-validated, calls JobMonitor directly) and a Celery beat task that polls AWX every 30 seconds. Whichever arrives first updates the execution row and emits WebSocket progress to the frontend.
celery-worker
Image: fabrik-backend:latest (same image as backend)
The workhorse. One container runs a single Celery process with configurable concurrency (CELERY_WORKER_CONCURRENCY, default 2). It subscribes to seven queues:
celery, query_exec, scheduled, awx_monitor, awx_exec, maintenance, mim_importTasks route to a queue based on @shared_task(queue=...) in the code — you don't configure routing in .env. Scale by running more worker containers rather than raising concurrency beyond 4 (Python GIL trade-offs).
celery-beat
Image: fabrik-backend:latest
Scheduler. One process, one replica — running two would double-fire every scheduled task. Reads the schedule from the database (django_celery_beat tables) and emits tasks onto Redis for workers to pick up.
Beat has no healthcheck beyond process liveness; watch its logs for Scheduler: Sending due task... lines when you expect a job to fire.
frontend
Image: fabrik-frontend:latest
A multi-stage build: stage 1 runs vite build, stage 2 is nginx:alpine serving the static SPA from /usr/share/nginx/html and reverse-proxying /api/, /admin/, /static/, and /ws/ to the backend container on port 8000. nginx listens on port 80 inside the container; map it to the host with FRONTEND_PORT (default 80).
Because all browser traffic terminates here, you usually don't expose the backend port externally. The frontend is the only public entry point; the backend is reachable only through the nginx proxy on the same container network.
Reading the logs
Every service logs JSON lines to the Docker daemon with a 10 MB × 3 file rotation. Common patterns:
# Everything, followed
docker compose logs -f
# Just one service
docker compose logs -f backend
# Last 100 lines from workers and beat
docker compose logs --tail=100 celery-worker celery-beatScaling guidance
- More users / more queries running concurrently → scale
celery-worker(setdocker compose up --scale celery-worker=3) or raiseCELERY_WORKER_CONCURRENCY. - Big MIM / many APIC versions → raise
NEO4J_HEAP_MAX_SIZEandNEO4J_PAGECACHE_SIZE, grow the container memory limit to match. - Heavy audit log traffic → Postgres. Consider a managed Postgres and point
DATABASE_URLat it.
Always docker stats first, scale second.