Deploy AETHER on Microsoft Azure

1. Prerequisites

Accounts, tooling, and access you need before provisioning anything.

Install and authenticate the Azure CLI

Install the Azure CLI and sign in with an identity that has Contributor rights on the target subscription. Set the active subscription and create a resource group (this guide uses southafricanorth).

Sign in and create a resource group

az --version
az login
az account set --subscription '<SUBSCRIPTION_ID>'
az group create --name afroai-rg --location southafricanorth

Note

Commands here are bash (Azure Cloud Shell is bash). On Windows PowerShell, translate loops for r in a b; do … done → foreach ($r in 'a','b') { } and the line-continuation \ → backtick `. AWS is the run-validated reference path — Azure follows the same AETHER architecture.

Authenticate and pull the AETHER images from GHCR

AETHER images are published to GitHub Container Registry at ghcr.io/rizaanlakay/afroai. Authenticate Docker with a GitHub Personal Access Token (classic) that has read:packages, then pull the four component images. You re-tag and push them to ACR in Phase 3.

1 — Authenticate to GHCR

echo "<GITHUB_PAT>" | docker login ghcr.io -u <GITHUB_USERNAME> --password-stdin

2 — Pull all AETHER images

TAG=1.0.4
for r in agent-web agent-service agent-api agent-worker; do
  docker pull ghcr.io/rizaanlakay/afroai/$r:$TAG
done

Note

Images are at ghcr.io/rizaanlakay/afroai/{service}:<tag> — use the latest release tag. You never build from source.

Choose your sizing — production vs. low-cost evaluation

For a test deployment, use a Burstable Flexible Server (Standard_B1ms), Basic Redis (C0), and Container Apps min-replicas of 1 (the consumption plan scales to zero when idle). Production targets ~1,000 concurrent conversations: GeneralPurpose Flexible Server (4+ vCores), Standard Redis (C1+), and min-replicas of 3 for agent-web/agent-service. Phase 7 has stop/start scripts so you are not billed while idle.

Tip

The chat path is SignalR over a Redis backplane, so agent-web/agent-service scale linearly. The worker scales on RabbitMQ queue length via a KEDA rule.

2. Provision data & infrastructure

Create the managed PostgreSQL, Redis, object storage, message broker, and secrets.

Create Azure Database for PostgreSQL Flexible Server

Create a PostgreSQL 16 Flexible Server. AETHER stores the application schema and the RAG embeddings (3072-dim, pgvector) here. The admin user is the database admin (afroai_admin) — the app's web-user login is created in Phase 5. pgvector must be allow-listed via azure.extensions before initializing.

Production

az postgres flexible-server create \
  --resource-group afroai-rg --name afroai-pg \
  --tier GeneralPurpose --sku-name Standard_D4ds_v5 \
  --version 16 --admin-user afroai_admin --admin-password '<STRONG_PASSWORD>' \
  --database-name AfroAI

Test (Burstable)

az postgres flexible-server create \
  --resource-group afroai-rg --name afroai-pg \
  --tier Burstable --sku-name Standard_B1ms \
  --version 16 --admin-user afroai_admin --admin-password '<STRONG_PASSWORD>' \
  --database-name AfroAI --storage-size 32

Allow-list pgvector, then restart

az postgres flexible-server parameter set \
  --resource-group afroai-rg --server-name afroai-pg \
  --name azure.extensions --value VECTOR,UUID-OSSP
az postgres flexible-server restart --resource-group afroai-rg --name afroai-pg

Danger

Unlike AWS RDS (where pgvector is allowed by default), Azure requires azure.extensions = VECTOR — without it the initializer fails on CREATE EXTENSION vector. The admin username must be letters/digits/underscore (no hyphens).

Note

Open firewall access so you can run the one-time DB init from your workstation:

az postgres flexible-server firewall-rule create --resource-group afroai-rg --name afroai-pg --rule-name myip --start-ip-address <YOUR_IP> --end-ip-address <YOUR_IP>

. Remove it afterwards.

Create Azure Cache for Redis

Redis is the distributed cache (afroai: prefix) and the SignalR backplane (afroai-signalr: prefix). Connect on the SSL port 6380 with ssl=true.

Create the cache (C0 Basic for test, Standard C1+ for production)

az redis create --name afroai-redis --resource-group afroai-rg --location southafricanorth --sku Basic --vm-size C0

Note

Connection string: <name>.redis.cache.windows.net:6380,password=<key>,ssl=true,abortConnect=false. Azure Cache for Redis cannot be stopped — delete it to zero its cost when idle (Phase 7).

Deploy MinIO for object storage (Blob is not S3-compatible)

AETHER stores documents, artifacts, and knowledge files via the S3/MinIO API. Azure Blob Storage is not S3-compatible — run MinIO as a Container App backed by an Azure Files volume. The Container Apps environment (afroai-cae) is created here and reused by RabbitMQ and all Phase 4 apps. AETHER uses a single bucket (Minio:BucketName) which MinIO auto-creates.

Create the Container Apps environment (shared by MinIO, RabbitMQ, and the app components)

az containerapp env create --name afroai-cae --resource-group afroai-rg --location southafricanorth

Create the MinIO Container App (internal ingress)

az containerapp create \
  --name afroai-minio --resource-group afroai-rg --environment afroai-cae \
  --image quay.io/minio/minio:latest \
  --target-port 9000 --ingress internal \
  --env-vars MINIO_ROOT_USER=minioadmin MINIO_ROOT_PASSWORD='<STRONG>' \
  --command "/bin/sh" "-c" "minio server /data --console-address :9001"

Warning

Pointing Minio:Endpoint at an Azure Blob endpoint fails at runtime — Blob does not speak the S3 API. Point AETHER at the MinIO container's internal FQDN with Minio__Secure=false (internal HTTP). With MinIO you do not need Minio__Region (that is only for AWS S3 signing).

Provision RabbitMQ (CloudAMQP recommended)

Azure has no managed RabbitMQ. Use CloudAMQP (free Little Lemur plan) or run a RabbitMQ container app. Supply the amqps://<user>:<pass>@<host>/<vhost> URL to ConnectionStrings__queue.

Option A — CloudAMQP (free, recommended)

1. Sign up at cloudamqp.com, create a 'Little Lemur (Free)' instance
2. Copy its AMQP URL (amqps://user:pass@host/vhost)
3. Use it verbatim as the queue secret

Option B — RabbitMQ container app

az containerapp create \
  --name afroai-rabbit --resource-group afroai-rg --environment afroai-cae \
  --image rabbitmq:3.13-management \
  --target-port 5672 --ingress internal \
  --env-vars RABBITMQ_DEFAULT_USER=afroai RABBITMQ_DEFAULT_PASS='<STRONG_PASSWORD>'

Warning

Do not substitute Azure Service Bus. AETHER's worker uses the MassTransit RabbitMQ transport; switching would require rebuilding the binaries. AETHER configures the bus with cfg.Host(new Uri(url)), so CloudAMQP's URL with a /vhost works as-is.

Store secrets (Key Vault + Container Apps secrets)

AETHER reads these sensitive values. Store them in Key Vault and reference them as Container Apps secrets (via managed identity) in Phase 4. The env-var column shows the .NET config key (: → __).

Key Vault secret	Env var (Container App)	What it is
`db-connection`	`ConnectionStrings__DefaultConnection`	Postgres conn string incl. the `web-user` password
`queue`	`ConnectionStrings__queue`	CloudAMQP / RabbitMQ `amqps://` URL
`openai-key`	`KernelMemory__AI__OpenAI__ApiKey`	Your OpenAI API key
`orchestrator-key`	`Services__OrchestratorApiKey`	Internal API key — you generate it
`mcp-key`	`Mcp__CredentialEncryptionKey`	AES-256 key, base64 of exactly 32 bytes
`minio-access-key`	`Minio__AccessKey`	MinIO root user (or access key)
`minio-secret-key`	`Minio__SecretKey`	MinIO root password (or secret key)

Set KernelMemory__Services__Postgres__ConnectionString to the same value as db-connection (KernelMemory's RAG store reuses the database).

Create the vault and the two generated keys (32-byte base64)

az keyvault create --name afroai-kv --resource-group afroai-rg --location southafricanorth
az keyvault secret set --vault-name afroai-kv --name orchestrator-key --value "$(openssl rand -base64 32)"
az keyvault secret set --vault-name afroai-kv --name mcp-key          --value "$(openssl rand -base64 32)"

Danger

Services:OrchestratorApiKey and Mcp:CredentialEncryptionKey must be byte-for-byte identical across agent-web, agent-service, and agent-api — a mismatch breaks internal auth and MCP credential decryption. The MCP key must decode to exactly 32 bytes.

Note

Endpoints, model names, and the bucket name are not secrets — set Redis:ConnectionString, Minio:Endpoint/Secure/BucketName, Services:AgentService, and KernelMemory:AI:OpenAI:TextModel/EmbeddingModel as plain env vars in Phase 4.

3. Push images to ACR

Mirror the AETHER images into your private Azure Container Registry.

Create an Azure Container Registry

Create one ACR. The Standard SKU is sufficient.

Create the registry

az acr create --resource-group afroai-rg --name afroaiacr --sku Standard

Re-tag the GHCR images and push to ACR

Authenticate Docker to ACR, then re-tag each image pulled in Phase 1 and push.

Login to ACR, re-tag from GHCR, push

TAG=1.0.4
az acr login --name afroaiacr
for r in agent-web agent-service agent-api agent-worker; do
  docker tag  ghcr.io/rizaanlakay/afroai/$r:$TAG \
              afroaiacr.azurecr.io/afroai/$r:$TAG
  docker push afroaiacr.azurecr.io/afroai/$r:$TAG
done

4. Deploy the services

Run the AETHER components on Azure Container Apps.

Prepare the per-app configuration

Each Container App receives configuration as --env-vars (plain) and --secrets (Key Vault references) at creation time. Prepare these before the create commands. Azure-specific endpoints:

ConnectionStrings__DefaultConnection — Flexible Server FQDN, port 5432, SslMode=Require
Redis__ConnectionString — Redis FQDN, port 6380, ssl=true,abortConnect=false
Minio__Endpoint — internal FQDN of afroai-minio; Minio__Secure=false; Minio__BucketName=afroai-artifacts
Services__AgentService — internal FQDN of afroai-agent-service
KernelMemory__AI__OpenAI__TextModel=gpt-4o-mini, EmbeddingModel=text-embedding-3-large

Warning

Per-component env: agent-service = db + queue + openai + orchestrator-key + mcp-key (no Redis/Minio). agent-api = db + orchestrator-key + Services__AgentService. agent-worker = db + queue + openai + minio, and uses DOTNET_ENVIRONMENT=Production (it is a Worker host, not ASP.NET). The worker image is built from Dockerfile.worker which bundles Python for the code/image sandbox.

Danger

If you configure a Container Apps health probe, point it at / — not /health. AETHER only maps /health in Development, so a probe on it fails in production. (The default ingress has no probe and works fine.)

Create the Container Apps per component

agent-web gets external ingress (it serves the UI). agent-api is external if you expose the public REST API. agent-service and agent-worker use internal ingress. Container Apps provides HTTPS ingress automatically.

agent-web (external ingress)

az containerapp create \
  --name afroai-agent-web --resource-group afroai-rg --environment afroai-cae \
  --image afroaiacr.azurecr.io/afroai/agent-web:1.0.4 --registry-server afroaiacr.azurecr.io \
  --target-port 8080 --ingress external --min-replicas 1 --max-replicas 20

agent-service + agent-api (internal / external as needed)

az containerapp create \
  --name afroai-agent-service --resource-group afroai-rg --environment afroai-cae \
  --image afroaiacr.azurecr.io/afroai/agent-service:1.0.4 --registry-server afroaiacr.azurecr.io \
  --target-port 8080 --ingress internal --min-replicas 1 --max-replicas 20

az containerapp create \
  --name afroai-agent-api --resource-group afroai-rg --environment afroai-cae \
  --image afroaiacr.azurecr.io/afroai/agent-api:1.0.4 --registry-server afroaiacr.azurecr.io \
  --target-port 8080 --ingress external --min-replicas 1 --max-replicas 20

agent-worker (internal, KEDA RabbitMQ scaler)

az containerapp create \
  --name afroai-agent-worker --resource-group afroai-rg --environment afroai-cae \
  --image afroaiacr.azurecr.io/afroai/agent-worker:1.0.4 --registry-server afroaiacr.azurecr.io \
  --ingress internal --min-replicas 1 --max-replicas 20 \
  --scale-rule-name rabbitmq --scale-rule-type rabbitmq \
  --scale-rule-metadata "queueName=run-python" "queueLength=5" \
  --scale-rule-auth "connection=queue-connection-string"

Note

Enable session affinity (sticky sessions) on the agent-web ingress so SignalR sticky connections reach the same replica. Image generation can take 70–120s — Container Apps' default request timeout is generous, but if you front it with another proxy, raise that proxy's timeout.

5. Initialize the AfroAI database

Create the schema, seed reference data, and the app login role.

Apply the schema + seed data

Connect to the Flexible Server as afroai_admin (firewall rule from Phase 2) and apply the bundled schema and seed — they ship with the installer and reflect a known-good database. The EF migration chain has drifted from the model, so the schema dump is the reliable path.

Apply schema + seed (download both from the installer's /Database page)

export PGPASSWORD='<AFROAI_ADMIN_PASSWORD>'
HOST=afroai-pg.postgres.database.azure.com
psql "host=$HOST port=5432 dbname=AfroAI user=afroai_admin sslmode=require" -v ON_ERROR_STOP=1 -f afroai-schema.sql
psql "host=$HOST port=5432 dbname=AfroAI user=afroai_admin sslmode=require" -v ON_ERROR_STOP=1 -f afroai-seed.sql

Create the app login role, then GRANT (run as afroai_admin on AfroAI)

CREATE ROLE "web-user" LOGIN PASSWORD '<WEB_USER_PWD>';
GRANT USAGE, CREATE ON SCHEMA public TO "web-user";
GRANT ALL ON ALL TABLES IN SCHEMA public TO "web-user";
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO "web-user";
ALTER DEFAULT PRIVILEGES FOR ROLE afroai_admin IN SCHEMA public GRANT ALL ON TABLES TO "web-user";
ALTER DEFAULT PRIVILEGES FOR ROLE afroai_admin IN SCHEMA public GRANT ALL ON SEQUENCES TO "web-user";

Danger

The app connects as web-user, but tables are owned by afroai_admin — without the GRANTs above, every page fails with "permission denied for table …". The seed includes the agent-creation reference data (categories, languages, tones, creativity levels, response lengths) and an initial admin user — change that user's password before going live.

6. Scale & harden

Autoscaling, connection pooling, shared keys, backups, and observability.

Configure Container Apps autoscale rules

Add HTTP-concurrency scale rules for agent-web/agent-service; the worker already scales on queue length. Before scaling agent-web past one replica, configure shared Data Protection keys (persist to Redis) and session affinity, or cookies/SignalR break across replicas.

HTTP concurrency rule for agent-web

az containerapp update \
  --name afroai-agent-web --resource-group afroai-rg \
  --min-replicas 3 --max-replicas 20 \
  --scale-rule-name http-concurrency --scale-rule-type http \
  --scale-rule-metadata "concurrentRequests=50"

Warning

ASP.NET Core Data Protection keys default to local disk and are not shared across replicas. Configure a shared key ring (Redis) before running agent-web at more than one replica, or logins/antiforgery break.

Enable PgBouncer + backups

Flexible Server has PgBouncer built in. Enable it and point ConnectionStrings__DefaultConnection at port 6432 to avoid exhausting max_connections. Enable geo-redundant backups.

Enable PgBouncer and backups

az postgres flexible-server parameter set --resource-group afroai-rg --server-name afroai-pg --name pgbouncer.enabled --value true
az postgres flexible-server update --resource-group afroai-rg --name afroai-pg --backup-retention 14 --geo-redundant-backup Enabled

Tip

Once PgBouncer is on, switch the connection string port to 6432. The KernelMemory RAG tables (km- prefix) grow with ingested knowledge — watch storage.

Observability

AETHER emits OpenTelemetry (via Aspire ServiceDefaults). Send it to Application Insights / Azure Monitor.

Create Application Insights

az monitor app-insights component create --app afroai-insights --resource-group afroai-rg --location southafricanorth --kind web

7. Operations & cost control

Watch the logs, and stop/start the stack so you are not billed while idle.

Monitor logs and app health

Tail each Container App's logs and check replica status.

Live tail one app's logs (repeat per component)

az containerapp logs show --name afroai-agent-worker --resource-group afroai-rg --follow

Replica status across the apps

for a in agent-web agent-service agent-api agent-worker; do
  echo "== $a =="
  az containerapp replica list --name afroai-$a --resource-group afroai-rg -o table
done

Tip

A resource that 404s only in production is almost always a Linux case-sensitivity issue (the container filesystem is case-sensitive; Windows dev is not). The error page hides the real exception — the logs have the stack trace.

Shut everything down (stop idle billing)

Scale every app to zero and stop the database. (Consumption-plan Container Apps already scale to zero when idle, but pinning min-replicas to 0 guarantees it.)

Scale all apps to 0 and stop Postgres

for app in afroai-agent-web afroai-agent-service afroai-agent-api afroai-agent-worker afroai-minio afroai-rabbit; do
  az containerapp update --name $app --resource-group afroai-rg --min-replicas 0 --max-replicas 0
done
az postgres flexible-server stop --resource-group afroai-rg --name afroai-pg

Warning

Azure Cache for Redis cannot be stopped — delete it to zero its cost (and recreate on startup). A stopped Flexible Server auto-starts after 7 days; storage is still billed while stopped.

Start everything back up

Start the database first, then scale the apps back to their running replica counts.

Start Postgres, then restore replicas

az postgres flexible-server start --resource-group afroai-rg --name afroai-pg
az containerapp update --name afroai-minio  --resource-group afroai-rg --min-replicas 1 --max-replicas 1
az containerapp update --name afroai-rabbit --resource-group afroai-rg --min-replicas 1 --max-replicas 1
for app in afroai-agent-service afroai-agent-web afroai-agent-worker afroai-agent-api; do
  az containerapp update --name $app --resource-group afroai-rg --min-replicas 1 --max-replicas 20
done

Note

Start agent-service before agent-web. If you deleted Redis on shutdown, recreate it (Phase 2) and refresh the Redis env before scaling the apps up.