Deploy AETHER on Amazon Web Services

1. Prerequisites

Accounts, tooling, and access you need before provisioning anything.

Install and authenticate the AWS CLI

Install the AWS CLI v2 and configure an IAM principal allowed to create ECS, RDS, ElastiCache, S3, ECR, IAM, CloudWatch Logs and Secrets Manager resources. Pick a single region for the whole deployment (this guide uses af-south-1).

Configure and verify the CLI

aws --version
aws configure          # enter key, secret, region (af-south-1), output (json)
aws sts get-caller-identity

Note

Commands in this guide are PowerShell (the AETHER installer runs on Windows). On Linux/macOS, translate loops foreach ($x in a,b) { } → for x in a b; do … done and the backtick line-continuation ` → backslash \.

Warning

af-south-1 (Cape Town) is an opt-in region. Creating your first resource enables it. In the console, set the region selector (top-right) to Africa (Cape Town) or you will not see your resources.

Authenticate and pull the AETHER images from GHCR

AETHER images are published to GitHub Container Registry at ghcr.io/rizaanlakay/afroai. Authenticate Docker with a GitHub Personal Access Token (classic) that has the read:packages scope, then pull the four component images. You re-tag and push them to your private ECR in Phase 3.

1 — Authenticate to GHCR

"<GITHUB_PAT>" | docker login ghcr.io -u <GITHUB_USERNAME> --password-stdin

2 — Pull all AETHER images

$TAG = "1.0.4"
foreach ($r in "agent-web","agent-service","agent-api","agent-worker") {
  docker pull "ghcr.io/rizaanlakay/afroai/${r}:$TAG"
}

Note

Images are at ghcr.io/rizaanlakay/afroai/{service}:<tag> — use the latest release tag. Docker Desktop must be installed and running. You never build from source.

Choose your sizing — production vs. low-cost evaluation

Every provisioning step below shows a production size and a test / free-tier size. For a working evaluation, use the test sizes everywhere — the whole footprint then sits in or near the AWS Free Tier (the one exception is the message broker; see Phase 2). When you are done testing, Phase 7 has a one-shot script to stop everything so you are not billed while idle.

Tip

Production target for ~1,000 concurrent conversations: 3+ agent-web and agent-service tasks, 2+ agent-api/agent-worker, a Multi-AZ RDS (db.r6g.xlarge+), ElastiCache with a replica, and Postgres connection pooling (Phase 6). The chat path is SignalR over a Redis backplane, so web/service scale linearly.

2. Provision data & infrastructure

Create the managed PostgreSQL, Redis, object storage, message broker, and secrets.

Create RDS for PostgreSQL

Create a PostgreSQL 16/17 instance. AETHER stores both the application schema and the RAG embeddings (3072-dim, pgvector) here. The master user is the admin (afroai_admin) — not the app login. The app's web-user role is created later in Phase 5.

Production

aws rds create-db-instance `
  --db-instance-identifier afroai-pg `
  --engine postgres `
  --db-instance-class db.r6g.xlarge `
  --allocated-storage 100 --storage-type gp3 `
  --master-username afroai_admin --master-user-password '<STRONG_PASSWORD>' `
  --db-name AfroAI --multi-az --no-publicly-accessible

Test / free-tier

aws rds create-db-instance `
  --db-instance-identifier afroai-pg `
  --engine postgres `
  --db-instance-class db.t3.micro `
  --allocated-storage 20 --storage-type gp2 `
  --master-username afroai_admin --master-user-password '<STRONG_PASSWORD>' `
  --db-name AfroAI --no-multi-az --no-publicly-accessible

Danger

Boolean flags take no value. Use --no-publicly-accessible / --no-multi-az — not --publicly-accessible false (that errors). The master username must be letters/digits/underscore only (no hyphens).

Warning

Do not pin --engine-version to a minor that may not exist in your region (e.g. 17.2 fails). Omit it to get the default, or list options:

aws rds describe-db-engine-versions --engine postgres --query "DBEngineVersions[?starts_with(EngineVersion,'17')].EngineVersion" --output table

.

Note

pgvector is allow-listed by default on RDS (rds.allowed_extensions = *) — no custom parameter group needed. The initializer creates the extension automatically. (The allow-list gotcha only bites Azure/GCP.)

Make RDS reachable for the one-time DB initialization

The instance is created private. To run the AETHER DB-init tool (or psql) from your workstation in Phase 5, temporarily make it publicly reachable and open port 5432 to your IP.

Enable public access + grab the endpoint

aws rds modify-db-instance --db-instance-identifier afroai-pg --publicly-accessible --apply-immediately
# After it returns to 'available', the endpoint:
aws rds describe-db-instances --db-instance-identifier afroai-pg --query "DBInstances[0].Endpoint.Address" --output text

Open 5432 to your IP (find the instance's VPC security group in the console)

aws ec2 authorize-security-group-ingress --group-id <RDS_SG> --protocol tcp --port 5432 --cidr <YOUR_IP>/32

Danger

This exposes the database to the internet (restricted to your IP). It is fine for a one-time init, but revert it afterwards (--no-publicly-accessible + remove the rule). In production, run the init from inside the VPC (bastion or ECS exec) instead.

Create ElastiCache for Redis

Redis is the distributed cache (afroai: prefix) and the SignalR backplane (afroai-signalr: prefix). Keep transit encryption on so the app connects with ssl=true.

Production

aws elasticache create-replication-group `
  --replication-group-id afroai-redis `
  --replication-group-description 'AETHER cache + SignalR backplane' `
  --engine redis --cache-node-type cache.r6g.large `
  --num-node-groups 1 --replicas-per-node-group 1 `
  --automatic-failover-enabled --transit-encryption-enabled

Test / free-tier (single node, no failover)

aws elasticache create-replication-group `
  --replication-group-id afroai-redis `
  --replication-group-description 'AETHER cache + SignalR backplane' `
  --engine redis --cache-node-type cache.t3.micro `
  --num-node-groups 1 --replicas-per-node-group 0 `
  --transit-encryption-enabled

Get the primary endpoint (needed for the agent-web env in Phase 4)

aws elasticache describe-replication-groups --replication-group-id afroai-redis `
  --query "ReplicationGroups[0].NodeGroups[0].PrimaryEndpoint" --output json

Note

The connection string is <primary-endpoint>:6379,ssl=true,abortConnect=false. Only the deployed services reach Redis (in-VPC) — do not expose it publicly. ElastiCache cannot be stopped; to zero its cost when idle, delete and recreate it (see Phase 7).

Create one S3 bucket + a scoped IAM user

AETHER stores documents, generated artifacts, and knowledge files via the S3 API. It uses a single bucket for all object types (configured by Minio:BucketName). Create one bucket and an IAM user scoped to just that bucket.

1 — Create the bucket (name must be globally unique)

aws s3api create-bucket --bucket afroai-artifacts `
  --region af-south-1 --create-bucket-configuration LocationConstraint=af-south-1

2 — Scoped IAM policy (save as s3-policy.json)

@'
{
  "Version": "2012-10-17",
  "Statement": [
    { "Sid": "ListBucket", "Effect": "Allow", "Action": ["s3:ListBucket"], "Resource": "arn:aws:s3:::afroai-artifacts" },
    { "Sid": "ObjectRW", "Effect": "Allow", "Action": ["s3:GetObject","s3:PutObject","s3:DeleteObject"], "Resource": "arn:aws:s3:::afroai-artifacts/*" }
  ]
}
'@ | Out-File -FilePath s3-policy.json -Encoding ascii

3 — Create the user, attach the policy, make an access key

aws iam create-user --user-name afroai-s3
aws iam put-user-policy --user-name afroai-s3 --policy-name afroai-s3-access --policy-document file://s3-policy.json
aws iam create-access-key --user-name afroai-s3   # copy AccessKeyId + SecretAccessKey (shown once)

Warning

S3 bucket names are global. If afroai-artifacts is taken, add a suffix (e.g. your account id) and use that name everywhere — both ARNs in the policy and the Minio__BucketName env var in Phase 4.

Note

AETHER's MinIO SDK needs the region for S3 SigV4 signing — you set Minio__Region=af-south-1 in Phase 4. The endpoint is s3.af-south-1.amazonaws.com with Minio__Secure=true.

Provision the RabbitMQ broker (CloudAMQP recommended)

The worker consumes jobs over RabbitMQ (MassTransit transport). On AWS, the only managed RabbitMQ is Amazon MQ — but its smallest RabbitMQ instance is mq.m5.large (there is no free tier and mq.t3.micro is not offered for RabbitMQ). For a low-cost deployment, use CloudAMQP's free plan instead.

Option A — CloudAMQP (free, recommended)

1. Sign up at cloudamqp.com
2. Create instance: plan 'Little Lemur (Free)', region closest to af-south-1
3. Open the instance, copy the AMQP URL (amqps://user:pass@host/vhost)
4. Use that URL verbatim as the afroai/queue secret in the next step

Option B — Amazon MQ (managed, NOT free; delete when done)

aws mq create-broker `
  --broker-name afroai-rabbit `
  --engine-type RABBITMQ --engine-version 3.13 `
  --deployment-mode SINGLE_INSTANCE `
  --host-instance-type mq.m5.large `
  --users Username=afroai,Password='<STRONG_PASSWORD>' `
  --no-publicly-accessible --auto-minor-version-upgrade

Warning

AETHER's worker is built for the RabbitMQ transport — SQS / Service Bus are not compatible and would require rebuilding the binaries. The queue value is always an amqps:// URL. AETHER configures the bus with cfg.Host(new Uri(url)), so CloudAMQP's amqps://user:pass@host/vhost works as-is.

Danger

Amazon MQ brokers cannot be stopped — billing only stops when you delete-broker. At ~$0.30+/hr for mq.m5.large that is ~$220/mo if left running. CloudAMQP free has nothing to stop.

Store secrets in AWS Secrets Manager

AETHER reads these sensitive values; store each as a secret and reference it from the ECS task definitions in Phase 4. The env-var column shows the .NET configuration key (: becomes __).

Secret name	Env var (task definition)	What it is
`afroai/db-connection`	`ConnectionStrings__DefaultConnection`	Full Postgres connection string incl. the `web-user` password
`afroai/queue`	`ConnectionStrings__queue`	RabbitMQ / CloudAMQP `amqps://` URL
`afroai/openai-key`	`KernelMemory__AI__OpenAI__ApiKey`	Your OpenAI API key — chat, embeddings, RAG, images
`afroai/orchestrator-key`	`Services__OrchestratorApiKey`	Internal API key (Web/Service↔API auth) — you generate it
`afroai/mcp-key`	`Mcp__CredentialEncryptionKey`	AES-256 key, base64 of exactly 32 bytes — you generate it
`afroai/minio-access-key`	`Minio__AccessKey`	Access key id of the `afroai-s3` IAM user
`afroai/minio-secret-key`	`Minio__SecretKey`	Secret access key of the `afroai-s3` IAM user

KernelMemory's RAG store reuses the same database — set KernelMemory__Services__Postgres__ConnectionString to the same value as afroai/db-connection in Phase 4.

1 — Generate the two internal keys (32-byte base64; reuse across services)

$rng = [System.Security.Cryptography.RandomNumberGenerator]::Create()
$b = New-Object byte[] 32
$rng.GetBytes($b); $ORCH_KEY = [Convert]::ToBase64String($b)
$rng.GetBytes($b); $MCP_KEY  = [Convert]::ToBase64String($b)
Write-Host "Orchestrator: $ORCH_KEY"
Write-Host "MCP:          $MCP_KEY"

2 — Store every secret

# DB connection — web-user + the password you'll set in Phase 5
aws secretsmanager create-secret --name afroai/db-connection `
  --secret-string "Host=afroai-pg.xxxx.af-south-1.rds.amazonaws.com;Port=5432;Database=AfroAI;Username=web-user;Password=<WEB_USER_PWD>;SslMode=Require;"

# Queue — the CloudAMQP (or Amazon MQ) amqps:// URL
aws secretsmanager create-secret --name afroai/queue --secret-string "amqps://user:pass@host/vhost"

aws secretsmanager create-secret --name afroai/openai-key      --secret-string "sk-..."
aws secretsmanager create-secret --name afroai/orchestrator-key --secret-string "$ORCH_KEY"
aws secretsmanager create-secret --name afroai/mcp-key          --secret-string "$MCP_KEY"
aws secretsmanager create-secret --name afroai/minio-access-key --secret-string "<AKIA...>"
aws secretsmanager create-secret --name afroai/minio-secret-key --secret-string "<SECRET>"

Danger

Services:OrchestratorApiKey and Mcp:CredentialEncryptionKey must be byte-for-byte identical across agent-web, agent-service, and agent-api. Generate once, reuse — a mismatch makes the API reject every orchestrator call and leaves MCP credentials undecryptable.

Note

Endpoints, model names, region, and bucket name are not secrets — set Redis:ConnectionString, Minio:Endpoint/Secure/Region/BucketName, Services:AgentService, and KernelMemory:AI:OpenAI:TextModel/EmbeddingModel as plain environment variables in Phase 4.

3. Push images to ECR

Mirror the AETHER images into your private registry.

Create ECR repositories

Create one repository per AETHER component.

Create repositories

foreach ($r in "agent-web","agent-service","agent-api","agent-worker") {
  aws ecr create-repository --repository-name "afroai/$r"
}

Re-tag the GHCR images and push to ECR

Authenticate Docker to ECR, then re-tag each image already pulled in Phase 1 and push. Note ${r} braces — in PowerShell a bare $r: is misread because of the colon.

Login to ECR, re-tag from GHCR, push

$TAG = "1.0.4"
$ACCOUNT = (aws sts get-caller-identity --query Account --output text)
$REGION = "af-south-1"
$REGISTRY = "$ACCOUNT.dkr.ecr.$REGION.amazonaws.com"
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REGISTRY
foreach ($r in "agent-web","agent-service","agent-api","agent-worker") {
  docker tag  "ghcr.io/rizaanlakay/afroai/${r}:$TAG" "$REGISTRY/afroai/${r}:$TAG"
  docker push "$REGISTRY/afroai/${r}:$TAG"
}

4. Deploy the services

ECS Fargate cluster, IAM roles, task definitions, networking, and the services.

Create the ECS cluster (+ service-linked role)

Create a Fargate cluster. On a brand-new account the ECS service-linked role may not exist yet, which makes cluster creation fail with "Unable to assume the service linked role" — create it first.

Create the role (harmless if it already exists) then the cluster

aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com   # 'has been taken' = already there, fine
aws ecs create-cluster --cluster-name afroai --capacity-providers FARGATE

Create the task execution role + log group

Fargate needs an execution role to pull from ECR, read your secrets, and write logs — and a CloudWatch log group to write into. Missing either is the #1 cause of tasks that never start ("unable to pull secrets" / "log group does not exist").

1 — Create the role with the right trust + policies

@'
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ecs-tasks.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
'@ | Out-File -FilePath ecs-trust.json -Encoding ascii
aws iam create-role --role-name ecsTaskExecutionRole --assume-role-policy-document file://ecs-trust.json
aws iam attach-role-policy --role-name ecsTaskExecutionRole `
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

@'
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "secretsmanager:GetSecretValue", "Resource": "arn:aws:secretsmanager:af-south-1:<ACCOUNT_ID>:secret:afroai/*" } ] }
'@ | Out-File -FilePath secrets-read.json -Encoding ascii
aws iam put-role-policy --role-name ecsTaskExecutionRole --policy-name afroai-secrets-read --policy-document file://secrets-read.json

2 — Create the log group + list your secret ARNs

aws logs create-log-group --log-group-name /ecs/afroai
aws secretsmanager list-secrets --query "SecretList[?starts_with(Name,'afroai/')].[Name,ARN]" --output table

Warning

Registering a task definition does not validate that the role exists — it just stores the ARN string. If the role is missing, the failure only shows at service-launch time. Confirm with aws iam get-role --role-name ecsTaskExecutionRole.

Register the task definitions

One task definition per component. All four set executionRoleArn to the role above, networkMode: awsvpc, requiresCompatibilities: ["FARGATE"], and an awslogs log config pointing at /ecs/afroai. Plain config goes in environment; sensitive values reference the secret ARNs in secrets (valueFrom). Below is the agent-web example — the others are subsets.

agent-web environment + secrets (excerpt; cpu 1024 / memory 2048)

{
  "environment": [
    { "name": "ASPNETCORE_ENVIRONMENT", "value": "Production" },
    { "name": "Redis__ConnectionString", "value": "<redis-endpoint>:6379,ssl=true,abortConnect=false" },
    { "name": "Services__AgentService", "value": "http://agent-service.afroai:8080" },
    { "name": "Minio__Endpoint", "value": "s3.af-south-1.amazonaws.com" },
    { "name": "Minio__Secure", "value": "true" },
    { "name": "Minio__Region", "value": "af-south-1" },
    { "name": "Minio__BucketName", "value": "afroai-artifacts" },
    { "name": "KernelMemory__AI__OpenAI__TextModel", "value": "gpt-4o-mini" },
    { "name": "KernelMemory__AI__OpenAI__EmbeddingModel", "value": "text-embedding-3-large" },
    { "name": "BaseUrl", "value": "http://<ALB_DNS>" }
  ],
  "secrets": [
    { "name": "ConnectionStrings__DefaultConnection", "valueFrom": "<db-connection ARN>" },
    { "name": "KernelMemory__Services__Postgres__ConnectionString", "valueFrom": "<db-connection ARN>" },
    { "name": "ConnectionStrings__queue", "valueFrom": "<queue ARN>" },
    { "name": "KernelMemory__AI__OpenAI__ApiKey", "valueFrom": "<openai-key ARN>" },
    { "name": "Services__OrchestratorApiKey", "valueFrom": "<orchestrator-key ARN>" },
    { "name": "Mcp__CredentialEncryptionKey", "valueFrom": "<mcp-key ARN>" },
    { "name": "Minio__AccessKey", "valueFrom": "<minio-access-key ARN>" },
    { "name": "Minio__SecretKey", "valueFrom": "<minio-secret-key ARN>" }
  ]
}

Register each (after saving the four JSON files)

foreach ($c in "agent-service","agent-api","agent-worker","agent-web") {
  aws ecs register-task-definition --cli-input-json file://taskdef-$c.json
}

Note

Per-component differences: agent-service = db + queue + openai + orchestrator-key + mcp-key (no Redis/Minio). agent-api = db + orchestrator-key + Services__AgentService. agent-worker = db + queue + openai + minio (no HTTP port; use DOTNET_ENVIRONMENT not ASPNETCORE_ENVIRONMENT; its image is built from Dockerfile.worker which includes Python for the code sandbox).

Tip

The OpenAI model is set by KernelMemory__AI__OpenAI__TextModel; keep EmbeddingModel = text-embedding-3-large (3072-dim, matches the schema).

Networking — Cloud Map, the ALB, and security groups

Internal services find each other via Cloud Map DNS (agent-service.afroai); agent-web is published through an internet-facing ALB. Default-VPC subnets are public, so tasks use assignPublicIp=ENABLED to reach ECR / OpenAI / CloudAMQP (no NAT gateway needed).

1 — Cloud Map namespace + a discovery service for agent-service

aws servicediscovery create-private-dns-namespace --name afroai --vpc <VPC_ID>
# wait ~90s, then:
$NSID = (aws servicediscovery list-namespaces --query "Namespaces[?Name=='afroai'].Id" --output text)
$SDARN = (aws servicediscovery create-service --name agent-service --namespace-id $NSID `
  --dns-config "NamespaceId=$NSID,RoutingPolicy=MULTIVALUE,DnsRecords=[{Type=A,TTL=60}]" --query "Service.Arn" --output text)

2 — ALB security group + rule, ALB, target group, listener

$ALBSG = (aws ec2 create-security-group --group-name afroai-alb --description 'AETHER ALB' --vpc-id <VPC_ID> --query GroupId --output text)
aws ec2 authorize-security-group-ingress --group-id $ALBSG --protocol tcp --port 80 --cidr 0.0.0.0/0
# let the ALB reach tasks on 8080 (add to the default/task security group)
aws ec2 authorize-security-group-ingress --group-id <TASK_SG> --protocol tcp --port 8080 --source-group $ALBSG

$ALBARN = (aws elbv2 create-load-balancer --name afroai-web-alb --type application --scheme internet-facing --subnets <SUBNET_A> <SUBNET_B> --security-groups $ALBSG --query "LoadBalancers[0].LoadBalancerArn" --output text)
$TGARN = (aws elbv2 create-target-group --name afroai-web-tg --protocol HTTP --port 8080 --vpc-id <VPC_ID> --target-type ip --health-check-path "/" --matcher HttpCode=200-399 --query "TargetGroups[0].TargetGroupArn" --output text)
aws elbv2 create-listener --load-balancer-arn $ALBARN --protocol HTTP --port 80 --default-actions Type=forward,TargetGroupArn=$TGARN

3 — Raise the ALB idle timeout (image generation holds the request ~80s)

aws elbv2 modify-load-balancer-attributes --load-balancer-arn $ALBARN `
  --attributes Key=idle_timeout.timeout_seconds,Value=300

Danger

Set the target-group health check to / (the landing page), not /health. AETHER's health endpoints are only mapped in Development, so /health 404s in production and the task is killed in a loop. --matcher HttpCode=200-399 tolerates the page returning a redirect.

Note

HTTP-only ALB is fine for testing — the auth cookie uses SameAsRequest, so it works over HTTP. For production add an ACM certificate + a 443 listener (HTTPS). The default 60s ALB idle timeout is raised to 300s above so long image-generation requests are not cut off.

Create the ECS services

Create one service per component. agent-service registers in Cloud Map; agent-web attaches to the ALB target group with a startup grace period; agent-worker and agent-api need no ingress. Use --desired-count 1 for a test.

Shared network config

$NET = "awsvpcConfiguration={subnets=[<SUBNET_A>,<SUBNET_B>],securityGroups=[<TASK_SG>],assignPublicIp=ENABLED}"

agent-service (Cloud Map) + agent-web (ALB)

aws ecs create-service --cluster afroai --service-name agent-service `
  --task-definition afroai-agent-service --desired-count 1 --launch-type FARGATE `
  --network-configuration $NET --service-registries "registryArn=$SDARN"

aws ecs create-service --cluster afroai --service-name agent-web `
  --task-definition afroai-agent-web --desired-count 1 --launch-type FARGATE `
  --network-configuration $NET `
  --load-balancers "targetGroupArn=$TGARN,containerName=agent-web,containerPort=8080" `
  --health-check-grace-period-seconds 180

agent-worker + agent-api (no ingress)

foreach ($s in "agent-worker","agent-api") {
  aws ecs create-service --cluster afroai --service-name $s `
    --task-definition afroai-$s --desired-count 1 --launch-type FARGATE `
    --network-configuration $NET
}

Warning

Run agent-web at a single replica for now: ASP.NET Core Data Protection keys default to local disk and are not shared across tasks, so multiple replicas break cookies/antiforgery. Scaling web past 1 needs a shared key ring (Redis) — see Phase 6.

5. Initialize the AfroAI database

Create the schema, seed reference data, and the app login role.

Apply the schema + seed data

Connect to RDS as afroai_admin (via the public access from Phase 2, or a bastion) and apply the bundled schema and seed. These ship with the installer and reflect a known-good database. psql is included with pgAdmin under runtime\psql.exe.

Apply schema + seed (download both from the installer's /Database page)

$env:PGPASSWORD = "<AFROAI_ADMIN_PASSWORD>"; $env:PGSSLMODE = "require"
$PROD = (aws rds describe-db-instances --db-instance-identifier afroai-pg --query "DBInstances[0].Endpoint.Address" --output text)
psql -h $PROD -U afroai_admin -d AfroAI -v ON_ERROR_STOP=1 -f afroai-schema.sql
psql -h $PROD -U afroai_admin -d AfroAI -v ON_ERROR_STOP=1 -f afroai-seed.sql

Create the app login role, set the admin password, and grant privileges

-- run as afroai_admin on the AfroAI database
CREATE ROLE "web-user" LOGIN PASSWORD '<WEB_USER_PWD>';
GRANT USAGE, CREATE ON SCHEMA public TO "web-user";
GRANT ALL ON ALL TABLES IN SCHEMA public TO "web-user";
GRANT ALL ON ALL SEQUENCES IN SCHEMA public TO "web-user";
ALTER DEFAULT PRIVILEGES FOR ROLE afroai_admin IN SCHEMA public GRANT ALL ON TABLES TO "web-user";
ALTER DEFAULT PRIVILEGES FOR ROLE afroai_admin IN SCHEMA public GRANT ALL ON SEQUENCES TO "web-user";

Danger

The app connects as web-user (in the afroai/db-connection secret). The schema and tables are owned by afroai_admin, so web-user needs the GRANTs above — without them every page fails with "permission denied for table …".

Tip

The seed includes the reference data the agent-creation UI needs (categories, languages, tones, creativity levels, response lengths) and an initial admin user. Update that user's password before going live. The live Initialize Database tool is an alternative for a from-scratch database.

6. Scale & harden

Autoscaling, connection pooling, shared keys, backups, and observability for production.

Configure ECS service autoscaling

Add Application Auto Scaling: scale agent-web/agent-service on CPU or ALB request count, and agent-worker on queue depth.

Register a scalable target and a CPU target-tracking policy

aws application-autoscaling register-scalable-target --service-namespace ecs `
  --resource-id service/afroai/agent-web --scalable-dimension ecs:service:DesiredCount `
  --min-capacity 3 --max-capacity 20
aws application-autoscaling put-scaling-policy --service-namespace ecs `
  --resource-id service/afroai/agent-web --scalable-dimension ecs:service:DesiredCount `
  --policy-name afroai-web-cpu --policy-type TargetTrackingScaling `
  --target-tracking-scaling-policy-configuration '{\"TargetValue\":60.0,\"PredefinedMetricSpecification\":{\"PredefinedMetricType\":\"ECSServiceAverageCPUUtilization\"}}'

Warning

Before scaling agent-web beyond one task, configure shared Data Protection keys (persist to Redis) and enable ALB stickiness + WebSocket support, or SignalR chat and cookies break across replicas.

Add RDS Proxy for connection pooling

EF Core opens many connections under load; many Fargate tasks multiply that. Put RDS Proxy in front of PostgreSQL and point ConnectionStrings__DefaultConnection at the proxy endpoint to avoid exhausting max_connections.

Tip

The KernelMemory RAG tables (km- prefix) grow with ingested knowledge — schedule backups and watch storage.

Enable observability and backups

AETHER emits OpenTelemetry traces/metrics (via Aspire ServiceDefaults). Forward them to CloudWatch / AWS Distro for OpenTelemetry, and enable automated backups.

Turn on RDS automated backups

aws rds modify-db-instance --db-instance-identifier afroai-pg --backup-retention-period 14 --apply-immediately

7. Operations & cost control

Watch the logs, and stop/start the whole stack so you are not billed while idle.

Monitor logs and service health

All four containers log to the /ecs/afroai CloudWatch group. Tail it live, check service health, and filter for errors fast.

Live tail of every AETHER container

aws logs tail /ecs/afroai --follow

Find errors quickly

aws logs tail /ecs/afroai --since 15m --filter-pattern "Exception"

Service health (running vs desired) + ALB target health

aws ecs describe-services --cluster afroai --services agent-web agent-service agent-api agent-worker `
  --query "services[].{name:serviceName,running:runningCount,desired:desiredCount,lastEvent:events[0].message}" --output table

aws elbv2 describe-target-health --target-group-arn <TG_ARN> --query "TargetHealthDescriptions[].TargetHealth.State" --output text

Tip

A resource that 404s only in production is almost always a Linux case-sensitivity issue (the container filesystem is case-sensitive; Windows dev is not). The error page hides the real exception — the CloudWatch log has the stack trace.

Shut everything down (stop idle billing)

The big variable cost is Fargate (per running task) and the RDS instance. Scale every service to zero and stop the database. Run this whenever you finish a session.

Scale all services to 0 and stop RDS

foreach ($s in "agent-web","agent-service","agent-api","agent-worker") {
  aws ecs update-service --cluster afroai --service $s --desired-count 0 | Out-Null
  Write-Host "scaled $s -> 0"
}
aws rds stop-db-instance --db-instance-identifier afroai-pg
Write-Host "RDS stopping; Fargate tasks draining."

Warning

Cannot be stopped, only deleted: ElastiCache, the ALB, and (if used) Amazon MQ bill hourly even idle. For a t3.micro cache + ALB that's only a few dollars/month — usually fine to leave. To zero them too, delete them (and recreate via Phase 2 on startup). RDS can stay stopped for up to 7 days before AWS auto-starts it; storage is still billed while stopped.

Start everything back up

Bring the database back first, wait for it, then scale the services to their running counts.

Start RDS, wait, then scale services back to 1 (or your production counts)

aws rds start-db-instance --db-instance-identifier afroai-pg
aws rds wait db-instance-available --db-instance-identifier afroai-pg
foreach ($s in "agent-service","agent-web","agent-worker","agent-api") {
  aws ecs update-service --cluster afroai --service $s --desired-count 1 | Out-Null
  Write-Host "scaled $s -> 1"
}

Note

Start agent-service before agent-web so the orchestrator is ready when the UI comes up. If you deleted ElastiCache / Amazon MQ on shutdown, recreate them (Phase 2) and refresh the afroai/queue secret + Redis env before scaling the services up.