Multi-environment deployment, architecture, and operational guides
This guide documents the process for creating a new environment (staging, demo, etc.) from scratch. This is only needed for initial environment creation - subsequent updates use normal terraform apply.
The bootstrap process handles several chicken-and-egg problems:
Goal: Create all AWS resources except custom domain configuration
# 1. Temporarily disable DNS resources
cd infra
mv dns.tf dns.temp
# 2. Create base infrastructure
# This creates: ECR, IAM roles/policies, DynamoDB, S3, Secrets Manager, App Runner
# Note: App Runner will fail (CREATE_FAILED) - this is expected, no image exists yet
terraform apply -var-file=environments/staging/vars.tfvars
# Terraform output will show App Runner in CREATE_FAILED state - ignore this for now
Resources Created:
staging-kaizencoach-appstaging-kaizencoach-usersstaging-kaizencoach-data (with lifecycle, encryption, versioning)staging-kaizencoach-app-secrets (empty)⚠️ CRITICAL: Do this BEFORE building the Docker image!
Goal: Add new environment to config.py so OAuth and GCP work correctly
The application needs to know about your new environment BEFORE you build the Docker image. Without this step, you’ll get OAuth callback errors when users try to log in.
Edit config.py (around lines 35-50):
# Add your new environment to REDIRECT_URIS
REDIRECT_URIS = {
'dev': 'http://127.0.0.1:5000/callback',
'staging': 'https://staging.kaizencoach.training/callback',
'prod': 'https://www.kaizencoach.training/callback',
'demo-shane': 'https://demo-shane.kaizencoach.training/callback', # ADD NEW ENV
}
# Add your new environment to GCP_PROJECTS
GCP_PROJECTS = {
'dev': 'kaizencoach-dev',
'staging': 'kaizencoach-staging',
'prod': 'kaizencoach-prod',
'demo': 'kaizencoach-demo',
'demo-shane': 'kaizencoach-shane', # ADD NEW ENV
}
Why This Matters:
REDIRECT_URIS: Strava OAuth will reject logins if the callback URL doesn’t matchGCP_PROJECTS: Vertex AI will fail if it can’t find the correct GCP projectConsequences of Skipping:
Verify Your Changes:
grep -A 10 "REDIRECT_URIS = {" config.py
grep -A 10 "GCP_PROJECTS = {" config.py
Goal: Provide the image that App Runner needs to start
# 3. Get ECR login credentials
aws ecr get-login-password --region eu-west-1 | \
sudo docker login --username AWS --password-stdin \
321490400104.dkr.ecr.eu-west-1.amazonaws.com
# 4. Build the application image
cd ../ # Back to project root
sudo docker build -t 321490400104.dkr.ecr.eu-west-1.amazonaws.com/staging-kaizencoach-app:latest .
# 5. Push to ECR
sudo docker push 321490400104.dkr.ecr.eu-west-1.amazonaws.com/staging-kaizencoach-app:latest
# Note: Replace 321490400104 with your AWS account ID
Verify Image Exists:
aws ecr describe-images \
--repository-name staging-kaizencoach-app \
--region eu-west-1
Goal: Add actual credentials to Secrets Manager
⚠️ CRITICAL: Generate NEW Secrets for Each Environment
DO NOT copy secrets from prod to staging or between environments!
Before proceeding, read SECRETS_GUIDE.md which explains:
Quick secret generation commands:
# Generate NEW Flask secret key (different for each environment!)
openssl rand -hex 32
# Generate NEW Garmin encryption key (different for each environment!)
openssl rand -base64 32
# Generate NEW Strava verify token (can be same or different per environment)
openssl rand -hex 20
# Get Strava credentials from https://www.strava.com/settings/api
# (each environment needs its own Strava app with unique callback domain)
# Format GCP service account JSON as single-line
cat .keys/kaizencoach-staging-sa-key.json | jq -c '.'
Prepare the Service Account JSON:
Go to JSON Formatter and convert the JSON block generated by GCP to a single line JSON
Then go to JSON Escaper paste the single line JSON in and escape it.
This will give the correct format for the value for the GOOGLE_APPLICATION_CREDENTIALS_JSON key in AWS secretsmanager
Prepare the secrets file based on the following template
{
"STRAVA_CLIENT_ID": "CLIENT ID FROM STRAVA API SETTINGS HERE",
"STRAVA_CLIENT_SECRET": "CLIENT SECRET FROM STRAVA API SETTINGS HERE",
"STRAVA_VERIFY_TOKEN": "STRAVA VERIFY TOKEN FROM ABOVE HERE",
"FLASK_SECRET_KEY": "FLASK SECRET KEY FROM ABOVE HERE",
"GCP_PROJECT_ID": "GCP PROJECT HERE",
"GCP_LOCATION": "GCP REGION HERE",
"GARMIN_ENCRYPTION_KEY": "GARMIN ENCRYPTION KEY FROM ABOVE HERE",
"GOOGLE_APPLICATION_CREDENTIALS_JSON": "SINGLE LINE JSON HERE"
}
aws apprunner start-deployment
–service-arn $(terraform output -raw apprunner_service_arn)
–region eu-west-1
**Prepare Service Account JSON:**
```bash
# Get the service account JSON from your .keys directory
cat .keys/kaizencoach-staging-sa-key.json
# Format it as a single line (no newlines) for the secrets JSON:
cat .keys/kaizencoach-staging-sa-key.json | jq -c '.'
Goal: Get App Runner to RUNNING state now that image exists
# 6. Apply Terraform again - this will replace the failed App Runner service
cd infra
terraform apply -var-file=environments/staging/vars.tfvars
# This time App Runner will:
# - Destroy the failed service
# - Create new service
# - Pull the image from ECR
# - Start successfully and reach RUNNING state
Verify App Runner is Running:
# Check in AWS Console: App Runner > staging-kaizencoach-service
# Status should show: RUNNING (not CREATE_FAILED)
# Or via CLI:
aws apprunner describe-service \
--service-arn $(terraform output -raw apprunner_service_arn) \
--region eu-west-1 \
--query 'Service.Status'
Goal: Add custom domain and DNS records
# 7. Re-enable DNS resources
mv dns.temp dns.tf
# 8. Create custom domain association (Stage 1)
# This generates the certificate validation records
terraform apply \
-var-file=environments/staging/vars.tfvars \
-target=aws_apprunner_custom_domain_association.main
# Wait ~30 seconds for certificate validation to be ready
# 9. Create DNS validation and A records (Stage 2)
terraform apply -var-file=environments/staging/vars.tfvars
# This creates:
# - CNAME records for certificate validation
# - A record for staging.kaizencoach.training
# - A record for www.staging.kaizencoach.training
Why Two Stages?
Terraform’s for_each requires all keys at plan time. App Runner’s certificate validation records are only available after the custom domain association is created. This is a Terraform limitation, not a bug.
Goal: Confirm everything works
# 14. Wait for DNS propagation (5-60 minutes)
watch dig staging.kaizencoach.training
# 15. Check SSL certificate status in AWS Console
# App Runner > staging-kaizencoach-service > Custom domains
# Status should show: Active (not Pending validation)
# 16. Test the application
curl https://staging.kaizencoach.training
curl https://www.staging.kaizencoach.training
# 17. Check logs
aws logs tail /aws/apprunner/staging-kaizencoach-service --follow
Goal: Enable real-time activity notifications from Strava
CRITICAL: Each environment needs its own webhook subscription. This is a manual step and cannot be automated in Terraform.
# 18. Get your Strava credentials from Secrets Manager
aws secretsmanager get-secret-value \
--secret-id staging-kaizencoach-app-secrets \
--region eu-west-1 \
--query SecretString --output text | jq -r '.STRAVA_CLIENT_ID, .STRAVA_CLIENT_SECRET, .STRAVA_VERIFY_TOKEN'
# Note these three values - you'll need them for the webhook setup
# 19. Check for any existing webhook subscriptions
curl -G https://www.strava.com/api/v3/push_subscriptions \
-d client_id=YOUR_CLIENT_ID \
-d client_secret=YOUR_CLIENT_SECRET
# If a webhook exists, delete it first (replace SUBSCRIPTION_ID with the id from above):
curl -X DELETE \
"https://www.strava.com/api/v3/push_subscriptions/SUBSCRIPTION_ID" \
-F client_id=YOUR_CLIENT_ID \
-F client_secret=YOUR_CLIENT_SECRET
# 20. Create new webhook subscription
curl -X POST https://www.strava.com/api/v3/push_subscriptions \
-F client_id=YOUR_CLIENT_ID \
-F client_secret=YOUR_CLIENT_SECRET \
-F callback_url=https://staging.kaizencoach.training/strava_webhook \
-F verify_token=YOUR_VERIFY_TOKEN
# Should respond with:
# {
# "id": 305097,
# "callback_url": "https://staging.kaizencoach.training/strava_webhook",
# "created_at": "2025-12-03T19:00:00Z"
# }
# SAVE THE SUBSCRIPTION ID - you'll need it to manage this webhook later
# 21. Test webhook endpoint
curl "https://staging.kaizencoach.training/strava_webhook?hub.mode=subscribe&hub.challenge=test12345&hub.verify_token=YOUR_VERIFY_TOKEN"
# Should respond with:
# {"hub.challenge": "test12345"}
# 22. Test end-to-end: Edit a Strava activity and watch logs
aws logs tail /aws/apprunner/staging-kaizencoach-service/service --follow --region eu-west-1 | grep "Webhook event"
# Should see: --- Webhook event received: {...} ---
Why Manual?
For Demo Environments: Each demo instance needs:
Important: Use a single hosted zone for all environments, not separate zones per subdomain.
kaizencoach.training hosted zone contains:
├── kaizencoach.training A → prod App Runner
├── www.kaizencoach.training A → prod App Runner
├── staging.kaizencoach.training A → staging App Runner
├── www.staging.kaizencoach.training A → staging App Runner
└── demo-xxx.kaizencoach.training A → demo App Runner
Your dns.tf should use data.aws_route53_zone.primary to reference the existing zone, not create new hosted zones.
❌ Separate hosted zones:
├── kaizencoach.training (Zone 1)
└── staging.kaizencoach.training (Zone 2) ← Requires delegation, adds complexity
Ensure your environments/{env}/vars.tfvars has:
# environments/staging/vars.tfvars
app_name = "kaizencoach"
environment = "staging"
domain_name = "staging.kaizencoach.training"
r53_zone_id = "Z0920467KPHM0P6Q2MOG" # ID of kaizencoach.training zone
common_tags = {
Application = "kaizencoach"
Environment = "staging"
ManagedBy = "terraform"
CostCenter = "kaizencoach-staging"
Project = "staging-kaizencoach"
}
Cause: No image in ECR Solution: Build and push Docker image (Phase 2)
Cause: Missing or invalid secrets Solution: Populate secrets in Secrets Manager (Phase 5)
Causes:
Check:
# DNS propagation
dig staging.kaizencoach.training
# Certificate status
aws apprunner describe-custom-domains \
--service-arn $(terraform output -raw apprunner_service_arn) \
--region eu-west-1
Cause: Trying to create validation records before custom domain association exists Solution: Use two-stage apply (Phase 4, steps 8-9)
Cause: App Runner cached the empty secrets from initial deploy Solution: Force new deployment after populating secrets:
aws apprunner start-deployment --service-arn <arn>
Once bootstrapped, the environment works like normal:
# Regular updates (code changes, config changes)
terraform apply -var-file=environments/staging/vars.tfvars
# Single apply works fine after initial setup
# Two-stage DNS apply only needed for bootstrap
mv dns.tf dns.temp)mv dns.temp dns.tf)Replace 321490400104 with your AWS account ID throughout this guide.
{ACCOUNT_ID}.dkr.ecr.{REGION}.amazonaws.com/{ENVIRONMENT}-kaizencoach-app:latest
Example:
321490400104.dkr.ecr.eu-west-1.amazonaws.com/staging-kaizencoach-app:latest
Use this checklist when creating a new environment:
When creating a new environment, ensure these files exist/are updated:
# Application code (CRITICAL - update BEFORE building Docker image)
config.py
├── REDIRECT_URIS dict # Add new environment's callback URL
└── GCP_PROJECTS dict # Add new environment's GCP project ID
# Infrastructure configuration
infra/
├── environments/
│ └── {env}/
│ ├── vars.tfvars # Environment-specific variables
│ └── backend.tfbackend # Terraform backend config
└── dns.tf # Temporarily renamed during bootstrap
# Service account keys
.keys/
└── kaizencoach-{env}-sa-key.json # GCP service account key
⚠️ Missing config.py updates will cause OAuth callback errors!
If you encounter issues during bootstrap:
For help, reference:
MULTI_ENVIRONMENT_SETUP.md - Overall architectureCUSTOM_DOMAIN_SETUP.md - DNS configuration details