End-to-end demo of the compiling-architecture skill workflow on a typical AWS-hosted, JavaScript-on-API production service. Follows the README's five-section Progressive-Refinement guidance — one signal per spec change, each spec change followed by a compile so the reader sees what the compiler does with the new signal.
Steps 1-4: author writes the bare-minimum spec
(cloud, language, platform, availability), reviews the baseline
compile, then adds the latency NFR — and the compiler REJECTS the
spec because caching-required-low-latency activates on
p99 <= 100ms and requires
features.caching: true. The annotated body and the
Suggestions block tell the author exactly which threshold fired and
which constraint to flip.
Steps 5-8: author opts in to caching, then adds
real throughput numbers. cache-aside--redis joins the
pattern set; a warn_nfr advisory fires briefly (caching
enabled but no throughput data) and resolves once the real QPS
numbers land. The composes graph at Step 8 visualises every
inter-pattern relationship the registry declares for selected
patterns — solid arrows for the wiring the
implementing-architecture skill will consume, dashed
for edges the registry declared but this spec didn't activate.
Steps 9-11: author commits to a cost story (intent
+ operating_model + ceilings); the compiler runs full cost
feasibility (Pattern OpEx + Ops Team Cost + CapEx) and surfaces a
ceiling breach with three concrete architectural levers. The author
picks one (raise ceilings to fit the real cost shape), then promotes
every assumptions.* key into the explicit spec body and
prepends the # STATUS: APPROVED comment header per the
compiling-architecture skill's contract. The result is
a self-contained architecture.yaml the
skills/implementing-architecture skill reads as its
input contract.
Non-agentic demo — no multi-agent topology, hosting platform, or agent-specific patterns. See the agentic counterpart for the agentic-scale version of the same workflow: hosting-mismatch rejection, multi-agent cost feasibility, per-pattern defaultConfig overrides, composes graph at agentic density.
The author starts with the bare minimum: cloud, language, platform, and a single availability target. No latency budget yet, no throughput, no feature flags, no cost ceilings. The compiler will fill in every other decision it has an opinion about and pick a baseline pattern set.
project: name: aws production api domain: product-catalog constraints: cloud: aws language: javascript platform: api nfr: availability: target: 0.999
The compiler runs and produces a complete compiled spec. The
assumptions section is the headline output — every
default the compiler applied on the author's behalf is enumerated
explicitly: feature flags off, low-throughput defaults, weekly
deploy cadence, zero dedicated ops engineers for cost math, and so
on. Verbose mode also annotates each spec field with the patterns
that activated on it (# <pattern-id>, ...), and
writes rejected-patterns.yaml for the patterns
considered but dropped.
# ─── input spec with pattern-activation annotations ───
# Each `# pattern-id` shows the patterns that activated on this spec value.
cloud: aws # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
language: javascript # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
platform: api # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
target: 0.999 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
# ─── what the compiler FILLED IN as assumptions ───
assumptions:
constraints:
tenantCount: 1
features:
caching: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (9 more)
async_messaging: false # arch-serverless--aws, resilience-timeouts-retries-backoff, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (11 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
document_store: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (11 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (11 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (9 more)
latency:
p95Milliseconds: 500 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (5 more)
p99Milliseconds: 1000 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (5 more)
throughput:
peak_query_per_second_read: 5 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (5 more)
peak_query_per_second_write: 1 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (4 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, iac-cloudformation
compliance:
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (2 more)
security:
auth: oauth2_oidc # arch-serverless--aws, idp-oidc--cognito, api-rest-resource-oriented, ... (2 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, api-rest-resource-oriented
audit_logging: false # arch-serverless--aws, db-managed-postgres, api-rest-resource-oriented, ... (5 more)
# … (more defaults below; expand the full output to see them)
# ─── Matched Patterns based on input spec ───
# meta = policy gates (always emitted when their feature flag is set)
# P0 = high priority — load-bearing architectural decisions
# P1 = mid priority — operational + observability + security baseline
# P2/P3 = lower priority — refinements + governance + docs
# Override priority by adding `patterns.<id>.recommended_priority: P0` to spec.
patterns:
P0: # (4 patterns)
- arch-serverless--aws # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
- db-managed-postgres # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
- arch-serverless-pay-per-use--aws-lambda # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
- iac-cloudformation # AWS-native IaC; deep service coverage; AWS-specific.
P1: # (15 patterns)
- resilience-timeouts-retries-backoff # Deadlines, bounded retries, exponential backoff with jitter; avoid retry storms.
composes:
wraps: ['sync-request-reply-rest']
- idp-oidc--cognito # Use AWS Cognito as a managed identity provider natively integrated with the AWS ecosystem. Provides user pools (authentication, registration, MFA), identity pools (federated AWS resource access), social and enterprise federation, and Lambda triggers for customization. Best choice for AWS-native applications requiring tight IAM integration. Free tier: 50,000 MAU.
- resilience-circuit-breaker # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
composes:
wraps: ['sync-request-reply-rest']
- api-rest-resource-oriented # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
- sync-request-reply-rest # Synchronous HTTP APIs; simple integration; needs timeouts/retries/backpressure.
- deploy-blue-green # Two environments; switch traffic; fast rollback; higher infra cost.
composes:
layered_after: ['iac-cloudformation']
- sec-auth-oauth2-oidc # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
composes:
wraps: ['api-rest-resource-oriented']
- crud-single-model # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
- finops-cost-allocation-tags # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
- release-feature-flags # Decouple deploy from release; safer experiments; needs kill switches and governance.
- obs-telemetry-backend--aws-cloudwatch # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
- obs-golden-signals # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
composes:
layered_after: ['obs-open-telemetry-baseline']
- obs-open-telemetry-baseline # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
composes:
co_runs_with: ['api-rest-resource-oriented']
- finops-budget-guardrails # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
- ops-slo-error-budgets # Define SLOs and error budgets to balance reliability and velocity.
P2: # (3 patterns)
- arch-egress-minimization # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
- api-versioning-header # Version via headers/media types; keeps URLs stable; harder to debug and cache.
- gov-system-manifest # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
composes:
layered_after: ['iac-cloudformation']
co_runs_with: ['release-feature-flags', 'gov-adrs-mandatory', 'ops-runbooks']
P3: # (2 patterns)
- ops-runbooks # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
- gov-adrs-mandatory # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
# ─── warns and cost feasibility ───
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
# Total Patterns Selected: 24
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 50,650
# Pattern OpEx (monthly): $ 420
# Ops Team Cost (monthly): $ 0
# Total OpEx (monthly): $ 420
# Total TCO (24mo): $ 60,730
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✓ PASS
#
# ============================================================
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
language: javascript # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
platform: api # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
nfr:
availability:
target: 0.999 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
assumptions:
constraints:
saas-providers: []
disallowed-saas-providers: []
ai-inference-platforms: []
disallowed-ai-inference-platforms: []
model-vendors: []
disallowed-model-vendors: []
tenantCount: 1
features:
caching: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (9 more)
async_messaging: false # arch-serverless--aws, resilience-timeouts-retries-backoff, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (11 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (12 more)
document_store: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (10 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (11 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (11 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (9 more)
latency:
p95Milliseconds: 500 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (5 more)
p99Milliseconds: 1000 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (5 more)
throughput:
peak_query_per_second_read: 5 # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (5 more)
peak_query_per_second_write: 1 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (4 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, iac-cloudformation
compliance:
gdpr: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
gdpr_rtbf: false
ccpa: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
hipaa: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (20 more)
sox: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (21 more)
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, resilience-timeouts-retries-backoff, ... (2 more)
security:
auth: oauth2_oidc # arch-serverless--aws, idp-oidc--cognito, api-rest-resource-oriented, ... (2 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, api-rest-resource-oriented
audit_logging: false # arch-serverless--aws, db-managed-postgres, api-rest-resource-oriented, ... (5 more)
operating_model:
on_call: false
deploy_freq: weekly
ops_team_size: 0
single_resource_monthly_ops_usd: 10000
amortization_months: 24
cost:
intent:
priority: minimize-opex
ceilings:
monthly_operational_usd: 500
one_time_setup_usd: 1000
preferences:
prefer_free_tier_if_possible: true # db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, idp-oidc--cognito, ... (2 more)
prefer_saas_first: false
patterns:
P0:
arch-serverless--aws: # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
compute_service: lambda # Options: lambda, fargate
api_gateway: api-gateway-http # Options: api-gateway-http, api-gateway-rest, alb, function-url
database: dynamodb # Options: dynamodb, aurora-serverless, rds-proxy
storage: s3-standard # Options: s3-standard, s3-intelligent-tiering, efs
auth_service: cognito # Options: cognito, lambda-authorizer, iam
event_bus: eventbridge # Options: eventbridge, sns-sqs, kinesis
db-managed-postgres: # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
provider: supabase # Options: supabase, neon, render, railway, digitalocean-app-platform
instance_size: small # Options: micro, small, medium, large
storage_gb: 8 # Range: 1-500
backup_retention_days: 7 # Range: 1-30
connection_pooling: true # Boolean
high_availability: false # Boolean
ssl_mode: require # Options: disable, allow, prefer, require, verify-ca, verify-full
arch-serverless-pay-per-use--aws-lambda: # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
memory_size: 512 # Options: 128, 256, 512, 1024, 2048, 3008
timeout: 30 # Range: 3-900
architecture: x86_64 # Options: x86_64, arm64
provisioned_concurrency: 0 # Range: 0-1000
iac-cloudformation: # AWS-native IaC; deep service coverage; AWS-specific.
stack_naming_convention: project-environment-resource # Options: project-environment-resource, environment-project-resource, flat-naming, custom
change_set_enabled: true # Boolean
termination_protection: false # Boolean
drift_detection: true # Boolean
stack_policy: none # Options: none, protect-all, protect-data-resources, custom
P1:
resilience-timeouts-retries-backoff: # Deadlines, bounded retries, exponential backoff with jitter; avoid retry storms.
initial_timeout_ms: 5000 # Range: 100-60000
max_retries: 3 # Range: 0-10
backoff_strategy: exponential_jitter # Options: exponential, exponential_jitter, linear, constant
backoff_multiplier: 2 # Range: 1-5
max_backoff_ms: 30000 # Range: 1000-300000
retry_on_timeout: true # Boolean
retry_on_errors: # Options: network, 5xx, 429, throttle, 4xx, all
- network
- 5xx
- throttle
circuit_breaker_enabled: false # Boolean
composes:
wraps:
- sync-request-reply-rest
idp-oidc--cognito: # Use AWS Cognito as a managed identity provider natively integrated with the AWS ecosystem. Provides user pools (authentication, registration, MFA), identity pools (federated AWS resource access), social and enterprise federation, and Lambda triggers for customization. Best choice for AWS-native applications requiring tight IAM integration. Free tier: 50,000 MAU.
pool_type: user-pool # Options: user-pool, user-pool-and-identity-pool
mfa_configuration: OPTIONAL # Options: OFF, OPTIONAL, ON
password_policy: medium # Options: low, medium, high
lambda_triggers_enabled: false # Boolean
resilience-circuit-breaker: # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
failure_threshold: 5 # Range: 1-20
success_threshold: 2 # Range: 1-10
timeout_duration: 60 # Range: 5-300
half_open_max_calls: 1 # Range: 1-10
fallback_strategy: fail_fast # Options: fail_fast, cached_response, default_value, alternative_service
composes:
wraps:
- sync-request-reply-rest
api-rest-resource-oriented: # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
pagination_style: offset # Options: offset, cursor, page_number
max_page_size: 100 # Range: 10-1000
versioning_strategy: uri # Options: uri, header, query_param, none
filtering_style: query_params # Options: query_params, json_body, graphql_like
cache_strategy: etag # Options: etag, last_modified, cache_control, none
id_format: uuid # Options: uuid, integer, slug, composite
response_envelope: false # Boolean
sync-request-reply-rest: # Synchronous HTTP APIs; simple integration; needs timeouts/retries/backpressure.
timeout_seconds: 30 # Range: 1-300
retry_strategy: exponential_backoff # Options: none, fixed_delay, exponential_backoff, exponential_backoff_jitter
max_retries: 3 # Range: 0-10
circuit_breaker_enabled: true # Boolean
rate_limiting_strategy: token_bucket # Options: none, token_bucket, leaky_bucket, fixed_window, sliding_window
idempotency_required: false # Boolean
deploy-blue-green: # Two environments; switch traffic; fast rollback; higher infra cost.
traffic_switch_strategy: instant # Options: instant, gradual, canary
health_check_required: true # Boolean
rollback_strategy: automatic # Options: automatic, manual, disabled
environment_parity: full # Options: full, scaled-down, minimal
warm_up_period_minutes: 5 # Range: 0-60
composes:
layered_after:
- iac-cloudformation
sec-auth-oauth2-oidc: # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
oauth_flow: authorization_code # Options: authorization_code, client_credentials, device_code, implicit
token_storage: secure_storage # Options: secure_storage, memory_only, encrypted_storage, httponly_cookie
pkce_enabled: true # Boolean
scope_strategy: minimal # Options: minimal, role_based, resource_specific
token_refresh: automatic # Options: automatic, manual, sliding_window
id_token_validation: strict # Options: strict, standard, relaxed
composes:
wraps:
- api-rest-resource-oriented
crud-single-model: # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
api_style: rest # Options: rest, graphql, rpc
validation_strategy: server-side # Options: server-side, client-side, both
soft_delete: false # Boolean
audit_logging: false # Boolean
pagination_default_size: 20 # Range: 10-100
finops-cost-allocation-tags: # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
tagging_strategy: hierarchical # Options: hierarchical, flat, hybrid
enforcement_level: required # Options: required, recommended, optional
cost_allocation_model: showback # Options: chargeback, showback, hybrid
tag_inheritance: true # Boolean
automated_tagging: true # Boolean
release-feature-flags: # Decouple deploy from release; safer experiments; needs kill switches and governance.
flag_storage: config_file # Options: config_file, database, feature_flag_service, environment_variables
evaluation_strategy: simple_boolean # Options: simple_boolean, percentage_rollout, user_targeting, multi_variate
targeting_capability: none # Options: none, user_attributes, context_based, advanced_segments
kill_switch_enabled: true # Boolean
audit_logging: false # Boolean
obs-telemetry-backend--aws-cloudwatch: # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
log_retention_days: 30 # Options: 1, 3, 7, 14, 30, 60, 90, 180, 365
xray_sampling_rate: 0.05 # Options: 0.01, 0.05, 0.1, 0.5, 1.0
obs-golden-signals: # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
latency_percentile: p95 # Options: p50, p95, p99, p99.9
error_rate_threshold: 1_percent # Options: 0.1_percent, 1_percent, 5_percent
saturation_metric: cpu_memory # Options: cpu_memory, queue_depth, connection_pool, disk_io
sli_window: 30_days # Options: 7_days, 30_days, 90_days
alert_severity_levels: critical_warning # Options: critical_only, critical_warning, critical_warning_info
composes:
layered_after:
- obs-open-telemetry-baseline
obs-open-telemetry-baseline: # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
export_backend: otlp # Options: otlp, jaeger, zipkin, prometheus, datadog, newrelic, honeycomb
trace_sampling_strategy: parent-based # Options: always-on, always-off, parent-based, trace-id-ratio
trace_sampling_rate: 1.0 # Range: 0.0-1.0
metrics_export_interval: 60 # Range: 10-300
log_correlation: true # Boolean
resource_detection: true # Boolean
propagation_format: w3c-tracecontext # Options: w3c-tracecontext, b3, jaeger, multi
composes:
co_runs_with:
- api-rest-resource-oriented
finops-budget-guardrails: # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
budget_period: monthly # Options: monthly, quarterly, annual
alert_thresholds:
- 50
- 80
- 100
enforcement_action: alert # Options: alert, prevent, throttle
tagging_strategy: mandatory # Options: mandatory, recommended, optional
policy_enforcement: soft # Options: soft, hard, audit
cost_allocation_level: project # Options: project, team, environment, service
ops-slo-error-budgets: # Define SLOs and error budgets to balance reliability and velocity.
slo_target_percentage: 99.9 # Range: 90-99.999
measurement_window_days: 30 # Options: 7, 28, 30, 90
error_budget_policy: halt-deployments # Options: halt-deployments, alert-only, slow-rollouts, require-approval
sli_type: availability # Options: availability, latency, throughput, correctness, composite
alerting_threshold_percentage: 80 # Range: 50-100
P2:
arch-egress-minimization: # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
cdn_strategy: full # Options: full, static-only, none
data_locality: regional # Options: global, regional, single-zone
cross_region_replication: minimal # Options: none, minimal, full
compression_enabled: true # Boolean
static_asset_strategy: cdn_edge # Options: cdn_edge, regional_storage, origin_only
api-versioning-header: # Version via headers/media types; keeps URLs stable; harder to debug and cache.
version_header_name: API-Version # Options: API-Version, X-API-Version, Accept-Version, Custom-Header
version_format: date-based # Options: semantic, date-based, sequential
fallback_behavior: latest-stable # Options: latest-stable, oldest-supported, reject-request
content_negotiation: false # Boolean
deprecation_policy: warning-header # Options: sunset-header, warning-header, both
gov-system-manifest: # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
manifest_path: docs/architecture/manifest.yaml
manifest_format: yaml # Options: yaml, toml, json
manifest_scope: # Options: agent-tools, agent-skills, agent-models, agent-prompts, data_sources, services, external_dependencies
- agent-tools
- agent-skills
- agent-models
- agent-prompts
pin_versions: true # Boolean
ci_validation: required # Options: required, optional, off
drift_policy: fail-build # Options: fail-build, warn-only, off
composes:
layered_after:
- iac-cloudformation
co_runs_with:
- release-feature-flags
- gov-adrs-mandatory
- ops-runbooks
P3:
ops-runbooks: # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
runbook_format: markdown # Options: markdown, wiki, structured_yaml, ticketing_system
incident_severity_levels: 4 # Options: 3, 4, 5
escalation_policy: tiered # Options: tiered, follow_the_sun, flat, hybrid
automation_integration: manual # Options: manual, semi_automated, fully_automated
review_frequency: quarterly # Options: monthly, quarterly, biannual, post_incident
gov-adrs-mandatory: # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
adr_format: madr # Options: madr, nygard, y-statements, custom
storage_location: docs/adrs # Options: docs/adrs, docs/architecture/decisions, adr, wiki
decision_threshold: significant # Options: all, significant, strategic-only
review_requirement: peer-review # Options: peer-review, architect-approval, team-consensus, none
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
# Total Patterns Selected: 24
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 50,650
# Pattern OpEx (monthly): $ 420
# Ops Team Cost (monthly): $ 0
# Total OpEx (monthly): $ 420
# Total TCO (24mo): $ 60,730
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✓ PASS
#
# ============================================================
# ============================================================
# Cost Feasibility Analysis (Details)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
#
# Ops team size: 0 engineers (no ops cost)
#
# Ops Team Cost Algorithm (for reference):
# Formula: ops_team_size × single_resource_monthly_ops_usd × on_call_multiplier × deploy_freq_multiplier
# Based on:
# - Google SRE Handbook (2016): On-call burden = 25-50% FTE overhead
# - DORA State of DevOps (2021): Deploy frequency impact on ops overhead
#
# Calculating costs for 24 selected patterns:
#
# PER-PATTERN COSTS:
# ────────────────────────────────────────────────────────────
#
# 1. arch-serverless--aws (match score: 35.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 2. db-managed-postgres (match score: 32.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 3. resilience-timeouts-retries-backoff (match score: 29.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 4. arch-serverless-pay-per-use--aws-lambda (match score: 28.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 5. idp-oidc--cognito (match score: 26.00)
# Adoption: $200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 6. resilience-circuit-breaker (match score: 26.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 7. api-rest-resource-oriented (match score: 25.00)
# Adoption: $750.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 8. sync-request-reply-rest (match score: 25.00)
# Adoption: $300.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 9. deploy-blue-green (match score: 25.00)
# Adoption: $2,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 10. sec-auth-oauth2-oidc (match score: 23.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 11. crud-single-model (match score: 22.00)
# Adoption: $300.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 12. finops-cost-allocation-tags (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 13. arch-egress-minimization (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 14. release-feature-flags (match score: 19.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 15. api-versioning-header (match score: 16.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 16. obs-telemetry-backend--aws-cloudwatch (match score: 14.00)
# Adoption: $500.0
# Monthly (min): $20.0
# Monthly (expected): $20.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $20.0
#
# 17. obs-golden-signals (match score: 12.00)
# Adoption: $3,000.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 18. obs-open-telemetry-baseline (match score: 12.00)
# Adoption: $3,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 19. finops-budget-guardrails (match score: 10.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 20. iac-cloudformation (match score: 10.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 21. ops-runbooks (match score: 8.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 22. ops-slo-error-budgets (match score: 8.00)
# Adoption: $4,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 23. gov-system-manifest (match score: 7.00)
# Adoption: $4,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 24. gov-adrs-mandatory (match score: 7.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# Total Monthly OpEx: $420.0
# Monthly operational ceiling: $500 ✓ PASS
# ============================================================
The author has agreed a latency target with the team and adds it to the spec. p95 ≤ 50ms, p99 ≤ 100ms. Each new constraint narrows pattern selection — some patterns that fit before will be dropped on recompile because they cannot meet 50ms p95.
nfr: latency: p95Milliseconds: 50 p99Milliseconds: 100
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws
language: javascript
platform: api
nfr:
availability:
target: 0.999
latency:
p95Milliseconds: 50
p99Milliseconds: 100
The compiler refuses to produce a final architecture. The policy
pattern caching-required-low-latency activated on
/nfr/latency/p99Milliseconds <= 100 and demands
caching be opted in via constraints.features.caching:
true. The compiler tells the author EXACTLY which threshold
triggered the rule and which constraint to flip. The annotated
body inline shows the gate firing on the latency field and the
❌ marker on the caching feature flag — the
architectural mismatch surfaces at spec time, not at deploy time.
This is the more useful face of the rejection signal called out in
the README's Progressive-Refinement section: "if a hard NFR target
cannot be satisfied by any available pattern, the compiler exits
with code 1 and explains what failed."
# ─── ❌ COMPILER REJECTED THIS SPEC ───
❌ Constraints/NFRs trade-off requirements not met:
[caching-required-low-latency] /constraints/features/caching == True
→ At P99 <= 100ms, caching is mandatory to keep tail latency within the SLO. Set constraints.features.caching: true
💡 Suggestions — consider changing these activation fields:
caching-required-low-latency activated by:
/constraints/cloud in [agnostic | aws | azure | gcp | on-prem | n/a]
/nfr/latency/p99Milliseconds <= 100
# ─── input spec with pattern-activation annotations ───
# Each `# pattern-id` shows the patterns that activated on this spec value.
cloud: aws # caching-required-low-latency
p99Milliseconds: 100 # caching-required-low-latency
# ─── what the compiler FILLED IN as assumptions ───
assumptions:
constraints:
tenantCount: 1
features:
caching: false # ❌ caching-required-low-latency
async_messaging: false
ai_inference: false
agentic_system: null
multi_tenancy: false
batch_processing: false
distributed_transactions: false
real_time_streaming: false
vector_search: false
document_store: false
key_value_store: false
graph_database: false
time_series_db: false
oltp_workload: true
olap_workload: false
cold_archive_tiering: false
messaging_delivery_guarantee: null
nfr:
rpo_minutes: 60
rto_minutes: 60
latency:
jobStartP95Seconds: null
jobStartP99Seconds: null
throughput:
peak_jobs_per_hour: null
peak_query_per_second_read: 5
peak_query_per_second_write: 1
data:
retention_days: 90
pii: false
compliance:
consistency:
needsReadYourWrites: false
durability:
strict: false
security:
# … (more defaults below; expand the full output to see them)
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws # caching-required-low-latency
language: javascript
platform: api
nfr:
availability:
target: 0.999
latency:
p95Milliseconds: 50
p99Milliseconds: 100 # caching-required-low-latency
assumptions:
constraints:
saas-providers:
disallowed-saas-providers:
ai-inference-platforms:
disallowed-ai-inference-platforms:
model-vendors:
disallowed-model-vendors:
tenantCount: 1
features:
caching: false # ❌ caching-required-low-latency
async_messaging: false
ai_inference: false
agentic_system: null
multi_tenancy: false
batch_processing: false
distributed_transactions: false
real_time_streaming: false
vector_search: false
document_store: false
key_value_store: false
graph_database: false
time_series_db: false
oltp_workload: true
olap_workload: false
cold_archive_tiering: false
messaging_delivery_guarantee: null
nfr:
rpo_minutes: 60
rto_minutes: 60
latency:
jobStartP95Seconds: null
jobStartP99Seconds: null
throughput:
peak_jobs_per_hour: null
peak_query_per_second_read: 5
peak_query_per_second_write: 1
data:
retention_days: 90
pii: false
compliance:
gdpr: false
gdpr_rtbf: false
ccpa: false
hipaa: false
sox: false
consistency:
needsReadYourWrites: false
durability:
strict: false
security:
auth: oauth2_oidc
tenant_isolation: n/a
audit_logging: false
operating_model:
on_call: false
deploy_freq: weekly
ops_team_size: 0
single_resource_monthly_ops_usd: 10000
amortization_months: 24
cost:
intent:
priority: minimize-opex
ceilings:
monthly_operational_usd: 500
one_time_setup_usd: 1000
preferences:
prefer_free_tier_if_possible: true
prefer_saas_first: false
patterns:
meta:
caching-required-low-latency:
P0:
arch-serverless--aws:
compute_service: lambda
api_gateway: api-gateway-http
database: dynamodb
storage: s3-standard
auth_service: cognito
event_bus: eventbridge
db-managed-postgres:
provider: supabase
instance_size: small
storage_gb: 8
backup_retention_days: 7
connection_pooling: true
high_availability: false
ssl_mode: require
arch-serverless-pay-per-use--aws-lambda:
memory_size: 512
timeout: 30
architecture: x86_64
provisioned_concurrency: 0
reserved_concurrency: null
iac-cloudformation:
stack_naming_convention: project-environment-resource
change_set_enabled: true
termination_protection: false
drift_detection: true
stack_policy: none
P1:
resilience-circuit-breaker:
failure_threshold: 5
success_threshold: 2
timeout_duration: 60
half_open_max_calls: 1
fallback_strategy: fail_fast
api-rest-resource-oriented:
pagination_style: offset
max_page_size: 100
versioning_strategy: uri
filtering_style: query_params
cache_strategy: etag
id_format: uuid
response_envelope: false
deploy-blue-green:
traffic_switch_strategy: instant
health_check_required: true
rollback_strategy: automatic
environment_parity: full
warm_up_period_minutes: 5
composes:
layered_after:
- iac-cloudformation
sec-auth-oauth2-oidc:
oauth_flow: authorization_code
token_storage: secure_storage
pkce_enabled: true
scope_strategy: minimal
token_refresh: automatic
id_token_validation: strict
composes:
wraps:
- api-rest-resource-oriented
crud-single-model:
api_style: rest
validation_strategy: server-side
soft_delete: false
audit_logging: false
pagination_default_size: 20
finops-cost-allocation-tags:
tagging_strategy: hierarchical
enforcement_level: required
cost_allocation_model: showback
tag_inheritance: true
automated_tagging: true
release-feature-flags:
flag_storage: config_file
evaluation_strategy: simple_boolean
targeting_capability: none
kill_switch_enabled: true
audit_logging: false
obs-telemetry-backend--aws-cloudwatch:
log_retention_days: 30
xray_sampling_rate: 0.05
obs-golden-signals:
latency_percentile: p95
error_rate_threshold: 1_percent
saturation_metric: cpu_memory
sli_window: 30_days
alert_severity_levels: critical_warning
composes:
layered_after:
- obs-open-telemetry-baseline
obs-open-telemetry-baseline:
export_backend: otlp
trace_sampling_strategy: parent-based
trace_sampling_rate: 1.0
metrics_export_interval: 60
log_correlation: true
resource_detection: true
propagation_format: w3c-tracecontext
composes:
co_runs_with:
- api-rest-resource-oriented
finops-budget-guardrails:
budget_period: monthly
alert_thresholds:
- 50
- 80
- 100
enforcement_action: alert
tagging_strategy: mandatory
policy_enforcement: soft
cost_allocation_level: project
ops-slo-error-budgets:
slo_target_percentage: 99.9
measurement_window_days: 30
error_budget_policy: halt-deployments
sli_type: availability
alerting_threshold_percentage: 80
P2:
arch-egress-minimization:
cdn_strategy: full
data_locality: regional
cross_region_replication: minimal
compression_enabled: true
static_asset_strategy: cdn_edge
api-versioning-header:
version_header_name: API-Version
version_format: date-based
fallback_behavior: latest-stable
content_negotiation: false
deprecation_policy: warning-header
gov-system-manifest:
manifest_path: docs/architecture/manifest.yaml
manifest_format: yaml
manifest_scope:
- agent-tools
- agent-skills
- agent-models
- agent-prompts
pin_versions: true
ci_validation: required
drift_policy: fail-build
composes:
layered_after:
- iac-cloudformation
co_runs_with:
- release-feature-flags
- gov-adrs-mandatory
- ops-runbooks
P3:
ops-runbooks:
runbook_format: markdown
incident_severity_levels: 4
escalation_policy: tiered
automation_integration: manual
review_frequency: quarterly
gov-adrs-mandatory:
adr_format: madr
storage_location: docs/adrs
decision_threshold: significant
review_requirement: peer-review
❌ Constraints/NFRs trade-off requirements not met:
[caching-required-low-latency] /constraints/features/caching == True
→ At P99 <= 100ms, caching is mandatory to keep tail latency within the SLO. Set constraints.features.caching: true
💡 Suggestions — consider changing these activation fields:
caching-required-low-latency activated by:
/constraints/cloud in [agnostic | aws | azure | gcp | on-prem | n/a]
/nfr/latency/p99Milliseconds <= 100
Feature flags live under constraints.features, not
under nfr — they represent opt-in capabilities, not
performance targets. The author flips
features.caching: true to bring cache-aside-related
patterns into scope.
constraints:
features:
caching: true
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws
language: javascript
platform: api
features:
caching: true
nfr:
availability:
target: 0.999
latency:
p95Milliseconds: 50
p99Milliseconds: 100
cache-aside--redis joins the pattern set. But: peak
read QPS isn't specified in the spec, so the compiler can't
confirm caching is worth its overhead. A warn_nfr
advisory fires — something like "caching enabled but peak read
QPS is <10 req/s; caching overhead may outweigh benefit at this
scale". The compiler is telling the author the pattern is
selected but probably overkill — a prompt to provide real
throughput data or reconsider the feature flag.
# ─── what the compiler FILLED IN as assumptions ───
assumptions:
constraints:
tenantCount: 1
features:
async_messaging: false # arch-serverless--aws, arch-serverless-pay-per-use--aws-lambda, caching-required-low-latency, ... (8 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
document_store: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (7 more)
throughput:
peak_query_per_second_read: 5 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
peak_query_per_second_write: 1 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
compliance:
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
security:
auth: oauth2_oidc # arch-serverless--aws, caching-required-low-latency, api-rest-resource-oriented, ... (1 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
audit_logging: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (6 more)
operating_model:
on_call: false
deploy_freq: weekly
ops_team_size: 0
# … (more defaults below; expand the full output to see them)
# ─── Matched Patterns based on input spec ───
# meta = policy gates (always emitted when their feature flag is set)
# P0 = high priority — load-bearing architectural decisions
# P1 = mid priority — operational + observability + security baseline
# P2/P3 = lower priority — refinements + governance + docs
# Override priority by adding `patterns.<id>.recommended_priority: P0` to spec.
patterns:
meta: # (1 pattern)
- caching-required-low-latency # Policy pattern that enforces caching when P99 latency target is 100ms or below — the threshold at which direct database queries under any meaningful load are unlikely to reliably satisfy the SLO without a cache layer in front.
P0: # (4 patterns)
- arch-serverless--aws # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
- db-managed-postgres # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
- arch-serverless-pay-per-use--aws-lambda # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
- iac-cloudformation # AWS-native IaC; deep service coverage; AWS-specific.
P1: # (13 patterns)
- resilience-circuit-breaker # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
- cache-aside # Application reads cache then DB on miss; writes update DB then invalidates/updates cache explicitly.
composes:
wraps: ['db-managed-postgres']
- api-rest-resource-oriented # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
- deploy-blue-green # Two environments; switch traffic; fast rollback; higher infra cost.
composes:
layered_after: ['iac-cloudformation']
- sec-auth-oauth2-oidc # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
composes:
wraps: ['api-rest-resource-oriented']
- crud-single-model # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
- finops-cost-allocation-tags # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
- release-feature-flags # Decouple deploy from release; safer experiments; needs kill switches and governance.
- obs-telemetry-backend--aws-cloudwatch # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
- obs-golden-signals # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
composes:
layered_after: ['obs-open-telemetry-baseline']
- obs-open-telemetry-baseline # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
composes:
co_runs_with: ['api-rest-resource-oriented']
- finops-budget-guardrails # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
- ops-slo-error-budgets # Define SLOs and error budgets to balance reliability and velocity.
P2: # (3 patterns)
- arch-egress-minimization # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
- api-versioning-header # Version via headers/media types; keeps URLs stable; harder to debug and cache.
- gov-system-manifest # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
composes:
layered_after: ['iac-cloudformation']
co_runs_with: ['release-feature-flags', 'gov-adrs-mandatory', 'ops-runbooks']
P3: # (2 patterns)
- ops-runbooks # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
- gov-adrs-mandatory # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
# ─── warns and cost feasibility ───
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
# Total Patterns Selected: 23
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 49,750
# Pattern OpEx (monthly): $ 435
# Ops Team Cost (monthly): $ 0
# Total OpEx (monthly): $ 435
# Total TCO (24mo): $ 60,190
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✓ PASS
#
# ============================================================
# ============================================================
# ⚠️ Pattern Advisory Warnings
# (Patterns are still SELECTED — review these before finalizing)
# ============================================================
#
# [warning] warn_nfr:
# cache-aside: peak read QPS is 5 req/s (<10 req/s). Caching overhead (infrastructure, invalidation, serialization) may outweigh benefit at this scale. Consider simplifying to direct DB access.
#
# Suggestions:
# - Cache-aside adds infrastructure complexity (cache store, invalidation logic, serialization). At <10 read req/s the overhead rarely justifies the benefit.
#
# ============================================================
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (20 more)
language: javascript # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (20 more)
platform: api # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (20 more)
features:
caching: true # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (8 more)
nfr:
availability:
target: 0.999 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
latency:
p95Milliseconds: 50 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
p99Milliseconds: 100 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (4 more)
assumptions:
constraints:
saas-providers: []
disallowed-saas-providers: []
ai-inference-platforms: []
disallowed-ai-inference-platforms: []
model-vendors: []
disallowed-model-vendors: []
tenantCount: 1
features:
async_messaging: false # arch-serverless--aws, arch-serverless-pay-per-use--aws-lambda, caching-required-low-latency, ... (8 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (11 more)
document_store: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (10 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (7 more)
throughput:
peak_query_per_second_read: 5 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
peak_query_per_second_write: 1 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
compliance:
gdpr: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (20 more)
gdpr_rtbf: false
ccpa: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (20 more)
hipaa: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (19 more)
sox: false # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (20 more)
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
security:
auth: oauth2_oidc # arch-serverless--aws, caching-required-low-latency, api-rest-resource-oriented, ... (1 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
audit_logging: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (6 more)
operating_model:
on_call: false
deploy_freq: weekly
ops_team_size: 0
single_resource_monthly_ops_usd: 10000
amortization_months: 24
cost:
intent:
priority: minimize-opex
ceilings:
monthly_operational_usd: 500
one_time_setup_usd: 1000
preferences:
prefer_free_tier_if_possible: true # db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, caching-required-low-latency, ... (2 more)
prefer_saas_first: false
patterns:
meta:
caching-required-low-latency: {} # Policy pattern that enforces caching when P99 latency target is 100ms or below — the threshold at which direct database queries under any meaningful load are unlikely to reliably satisfy the SLO without a cache layer in front.
P0:
arch-serverless--aws: # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
compute_service: lambda # Options: lambda, fargate
api_gateway: api-gateway-http # Options: api-gateway-http, api-gateway-rest, alb, function-url
database: dynamodb # Options: dynamodb, aurora-serverless, rds-proxy
storage: s3-standard # Options: s3-standard, s3-intelligent-tiering, efs
auth_service: cognito # Options: cognito, lambda-authorizer, iam
event_bus: eventbridge # Options: eventbridge, sns-sqs, kinesis
db-managed-postgres: # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
provider: supabase # Options: supabase, neon, render, railway, digitalocean-app-platform
instance_size: small # Options: micro, small, medium, large
storage_gb: 8 # Range: 1-500
backup_retention_days: 7 # Range: 1-30
connection_pooling: true # Boolean
high_availability: false # Boolean
ssl_mode: require # Options: disable, allow, prefer, require, verify-ca, verify-full
arch-serverless-pay-per-use--aws-lambda: # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
memory_size: 512 # Options: 128, 256, 512, 1024, 2048, 3008
timeout: 30 # Range: 3-900
architecture: x86_64 # Options: x86_64, arm64
provisioned_concurrency: 0 # Range: 0-1000
iac-cloudformation: # AWS-native IaC; deep service coverage; AWS-specific.
stack_naming_convention: project-environment-resource # Options: project-environment-resource, environment-project-resource, flat-naming, custom
change_set_enabled: true # Boolean
termination_protection: false # Boolean
drift_detection: true # Boolean
stack_policy: none # Options: none, protect-all, protect-data-resources, custom
P1:
resilience-circuit-breaker: # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
failure_threshold: 5 # Range: 1-20
success_threshold: 2 # Range: 1-10
timeout_duration: 60 # Range: 5-300
half_open_max_calls: 1 # Range: 1-10
fallback_strategy: fail_fast # Options: fail_fast, cached_response, default_value, alternative_service
cache-aside: # Application reads cache then DB on miss; writes update DB then invalidates/updates cache explicitly.
invalidation_strategy: ttl # Options: ttl, event-based, manual, lru
ttl_seconds: 3600 # Range: 1-86400
max_memory_mb: 512 # Range: 128-16384
cache_backend: redis # Options: redis, memcached, in-memory, hazelcast
write_strategy: write-through # Options: write-through, write-behind, invalidate-only
serialization_format: json # Options: json, msgpack, protobuf, pickle
composes:
wraps:
- db-managed-postgres
api-rest-resource-oriented: # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
pagination_style: offset # Options: offset, cursor, page_number
max_page_size: 100 # Range: 10-1000
versioning_strategy: uri # Options: uri, header, query_param, none
filtering_style: query_params # Options: query_params, json_body, graphql_like
cache_strategy: etag # Options: etag, last_modified, cache_control, none
id_format: uuid # Options: uuid, integer, slug, composite
response_envelope: false # Boolean
deploy-blue-green: # Two environments; switch traffic; fast rollback; higher infra cost.
traffic_switch_strategy: instant # Options: instant, gradual, canary
health_check_required: true # Boolean
rollback_strategy: automatic # Options: automatic, manual, disabled
environment_parity: full # Options: full, scaled-down, minimal
warm_up_period_minutes: 5 # Range: 0-60
composes:
layered_after:
- iac-cloudformation
sec-auth-oauth2-oidc: # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
oauth_flow: authorization_code # Options: authorization_code, client_credentials, device_code, implicit
token_storage: secure_storage # Options: secure_storage, memory_only, encrypted_storage, httponly_cookie
pkce_enabled: true # Boolean
scope_strategy: minimal # Options: minimal, role_based, resource_specific
token_refresh: automatic # Options: automatic, manual, sliding_window
id_token_validation: strict # Options: strict, standard, relaxed
composes:
wraps:
- api-rest-resource-oriented
crud-single-model: # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
api_style: rest # Options: rest, graphql, rpc
validation_strategy: server-side # Options: server-side, client-side, both
soft_delete: false # Boolean
audit_logging: false # Boolean
pagination_default_size: 20 # Range: 10-100
finops-cost-allocation-tags: # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
tagging_strategy: hierarchical # Options: hierarchical, flat, hybrid
enforcement_level: required # Options: required, recommended, optional
cost_allocation_model: showback # Options: chargeback, showback, hybrid
tag_inheritance: true # Boolean
automated_tagging: true # Boolean
release-feature-flags: # Decouple deploy from release; safer experiments; needs kill switches and governance.
flag_storage: config_file # Options: config_file, database, feature_flag_service, environment_variables
evaluation_strategy: simple_boolean # Options: simple_boolean, percentage_rollout, user_targeting, multi_variate
targeting_capability: none # Options: none, user_attributes, context_based, advanced_segments
kill_switch_enabled: true # Boolean
audit_logging: false # Boolean
obs-telemetry-backend--aws-cloudwatch: # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
log_retention_days: 30 # Options: 1, 3, 7, 14, 30, 60, 90, 180, 365
xray_sampling_rate: 0.05 # Options: 0.01, 0.05, 0.1, 0.5, 1.0
obs-golden-signals: # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
latency_percentile: p95 # Options: p50, p95, p99, p99.9
error_rate_threshold: 1_percent # Options: 0.1_percent, 1_percent, 5_percent
saturation_metric: cpu_memory # Options: cpu_memory, queue_depth, connection_pool, disk_io
sli_window: 30_days # Options: 7_days, 30_days, 90_days
alert_severity_levels: critical_warning # Options: critical_only, critical_warning, critical_warning_info
composes:
layered_after:
- obs-open-telemetry-baseline
obs-open-telemetry-baseline: # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
export_backend: otlp # Options: otlp, jaeger, zipkin, prometheus, datadog, newrelic, honeycomb
trace_sampling_strategy: parent-based # Options: always-on, always-off, parent-based, trace-id-ratio
trace_sampling_rate: 1.0 # Range: 0.0-1.0
metrics_export_interval: 60 # Range: 10-300
log_correlation: true # Boolean
resource_detection: true # Boolean
propagation_format: w3c-tracecontext # Options: w3c-tracecontext, b3, jaeger, multi
composes:
co_runs_with:
- api-rest-resource-oriented
finops-budget-guardrails: # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
budget_period: monthly # Options: monthly, quarterly, annual
alert_thresholds:
- 50
- 80
- 100
enforcement_action: alert # Options: alert, prevent, throttle
tagging_strategy: mandatory # Options: mandatory, recommended, optional
policy_enforcement: soft # Options: soft, hard, audit
cost_allocation_level: project # Options: project, team, environment, service
ops-slo-error-budgets: # Define SLOs and error budgets to balance reliability and velocity.
slo_target_percentage: 99.9 # Range: 90-99.999
measurement_window_days: 30 # Options: 7, 28, 30, 90
error_budget_policy: halt-deployments # Options: halt-deployments, alert-only, slow-rollouts, require-approval
sli_type: availability # Options: availability, latency, throughput, correctness, composite
alerting_threshold_percentage: 80 # Range: 50-100
P2:
arch-egress-minimization: # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
cdn_strategy: full # Options: full, static-only, none
data_locality: regional # Options: global, regional, single-zone
cross_region_replication: minimal # Options: none, minimal, full
compression_enabled: true # Boolean
static_asset_strategy: cdn_edge # Options: cdn_edge, regional_storage, origin_only
api-versioning-header: # Version via headers/media types; keeps URLs stable; harder to debug and cache.
version_header_name: API-Version # Options: API-Version, X-API-Version, Accept-Version, Custom-Header
version_format: date-based # Options: semantic, date-based, sequential
fallback_behavior: latest-stable # Options: latest-stable, oldest-supported, reject-request
content_negotiation: false # Boolean
deprecation_policy: warning-header # Options: sunset-header, warning-header, both
gov-system-manifest: # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
manifest_path: docs/architecture/manifest.yaml
manifest_format: yaml # Options: yaml, toml, json
manifest_scope: # Options: agent-tools, agent-skills, agent-models, agent-prompts, data_sources, services, external_dependencies
- agent-tools
- agent-skills
- agent-models
- agent-prompts
pin_versions: true # Boolean
ci_validation: required # Options: required, optional, off
drift_policy: fail-build # Options: fail-build, warn-only, off
composes:
layered_after:
- iac-cloudformation
co_runs_with:
- release-feature-flags
- gov-adrs-mandatory
- ops-runbooks
P3:
ops-runbooks: # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
runbook_format: markdown # Options: markdown, wiki, structured_yaml, ticketing_system
incident_severity_levels: 4 # Options: 3, 4, 5
escalation_policy: tiered # Options: tiered, follow_the_sun, flat, hybrid
automation_integration: manual # Options: manual, semi_automated, fully_automated
review_frequency: quarterly # Options: monthly, quarterly, biannual, post_incident
gov-adrs-mandatory: # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
adr_format: madr # Options: madr, nygard, y-statements, custom
storage_location: docs/adrs # Options: docs/adrs, docs/architecture/decisions, adr, wiki
decision_threshold: significant # Options: all, significant, strategic-only
review_requirement: peer-review # Options: peer-review, architect-approval, team-consensus, none
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
# Total Patterns Selected: 23
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 49,750
# Pattern OpEx (monthly): $ 435
# Ops Team Cost (monthly): $ 0
# Total OpEx (monthly): $ 435
# Total TCO (24mo): $ 60,190
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✓ PASS
#
# ============================================================
# ============================================================
# Cost Feasibility Analysis (Details)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
#
# Ops team size: 0 engineers (no ops cost)
#
# Ops Team Cost Algorithm (for reference):
# Formula: ops_team_size × single_resource_monthly_ops_usd × on_call_multiplier × deploy_freq_multiplier
# Based on:
# - Google SRE Handbook (2016): On-call burden = 25-50% FTE overhead
# - DORA State of DevOps (2021): Deploy frequency impact on ops overhead
#
# Calculating costs for 23 selected patterns:
#
# PER-PATTERN COSTS:
# ────────────────────────────────────────────────────────────
#
# 1. arch-serverless--aws (match score: 35.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 2. db-managed-postgres (match score: 32.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 3. arch-serverless-pay-per-use--aws-lambda (match score: 28.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 4. caching-required-low-latency (match score: 27.00)
# Adoption: $0.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 5. resilience-circuit-breaker (match score: 26.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 6. cache-aside (match score: 26.00)
# Adoption: $800.0
# Monthly (min): $15.0
# Monthly (expected): $15.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $15.0
#
# 7. api-rest-resource-oriented (match score: 25.00)
# Adoption: $750.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 8. deploy-blue-green (match score: 25.00)
# Adoption: $2,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 9. sec-auth-oauth2-oidc (match score: 23.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 10. crud-single-model (match score: 22.00)
# Adoption: $300.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 11. finops-cost-allocation-tags (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 12. arch-egress-minimization (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 13. release-feature-flags (match score: 19.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 14. api-versioning-header (match score: 16.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 15. obs-telemetry-backend--aws-cloudwatch (match score: 14.00)
# Adoption: $500.0
# Monthly (min): $20.0
# Monthly (expected): $20.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $20.0
#
# 16. obs-golden-signals (match score: 12.00)
# Adoption: $3,000.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 17. obs-open-telemetry-baseline (match score: 12.00)
# Adoption: $3,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 18. finops-budget-guardrails (match score: 10.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 19. iac-cloudformation (match score: 10.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 20. ops-runbooks (match score: 8.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 21. ops-slo-error-budgets (match score: 8.00)
# Adoption: $4,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 22. gov-system-manifest (match score: 7.00)
# Adoption: $4,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 23. gov-adrs-mandatory (match score: 7.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# Total Monthly OpEx: $435.0
# Monthly operational ceiling: $500 ✓ PASS
# ============================================================
# ============================================================
# ⚠️ Pattern Advisory Warnings
# (Patterns are still SELECTED — review these before finalizing)
# ============================================================
#
# [warning] warn_nfr:
# cache-aside: peak read QPS is 5 req/s (<10 req/s). Caching overhead (infrastructure, invalidation, serialization) may outweigh benefit at this scale. Consider simplifying to direct DB access.
#
# Suggestions:
# - Cache-aside adds infrastructure complexity (cache store, invalidation logic, serialization). At <10 read req/s the overhead rarely justifies the benefit.
#
# ============================================================
The author looks up actual peak load: 20 read QPS, 10 write QPS. Modest by AWS API standards, but real numbers. With concrete throughput data the compiler can make a definitive call — either the advisory disappears (caching is justified at this rate) or it persists with sharper reasoning. Either outcome is more useful than a pattern selected in the dark.
nfr: throughput: peak_query_per_second_read: 20 peak_query_per_second_write: 10
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws
language: javascript
platform: api
features:
caching: true
nfr:
availability:
target: 0.999
latency:
p95Milliseconds: 50
p99Milliseconds: 100
throughput:
peak_query_per_second_read: 20
peak_query_per_second_write: 10
The pattern set stabilises. The verbose output lists every selected
pattern under its priority bucket
(meta / P0 / P1 / P2 / P3) with the full per-pattern
defaultConfig values plus inline
# Options: alternatives on each line. Below the
compile output, the Mermaid composes graph
visualises every inter-pattern relationship the registry declares
for the selected patterns. Solid arrows are kept edges (target
pattern is in the selection — the implementing agent will wire
these); dashed arrows are pruned edges (target wasn't selected,
so the relationship doesn't survive into this architecture).
# ─── what the compiler FILLED IN as assumptions ───
assumptions:
constraints:
tenantCount: 1
features:
async_messaging: false # arch-serverless--aws, deploy-canary, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
document_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (8 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
compliance:
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
security:
auth: oauth2_oidc # arch-serverless--aws, caching-required-low-latency, api-rest-resource-oriented, ... (1 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
audit_logging: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (6 more)
operating_model:
on_call: false
deploy_freq: weekly
ops_team_size: 0
single_resource_monthly_ops_usd: 10000
amortization_months: 24
cost:
# … (more defaults below; expand the full output to see them)
# ─── pattern SELECTION with per-pattern config + alternatives ───
# Each line under a pattern shows the value the compiler ASSUMED.
# The `# Options: …` annotation lists alternatives you can override
# by setting `patterns.<pid>.<field>` in the spec (see next step).
patterns:
meta:
caching-required-low-latency: {} # Policy pattern that enforces caching when P99 latency target is 100ms or below — the threshold at which direct database queries under any meaningful load are unlikely to reliably satisfy the SLO without a cache layer in front.
P0:
arch-serverless--aws: # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
compute_service: lambda # Options: lambda, fargate
api_gateway: api-gateway-http # Options: api-gateway-http, api-gateway-rest, alb, function-url
database: dynamodb # Options: dynamodb, aurora-serverless, rds-proxy
storage: s3-standard # Options: s3-standard, s3-intelligent-tiering, efs
auth_service: cognito # Options: cognito, lambda-authorizer, iam
event_bus: eventbridge # Options: eventbridge, sns-sqs, kinesis
db-managed-postgres: # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
provider: supabase # Options: supabase, neon, render, railway, digitalocean-app-platform
instance_size: small # Options: micro, small, medium, large
storage_gb: 8 # Range: 1-500
backup_retention_days: 7 # Range: 1-30
connection_pooling: true # Boolean
high_availability: false # Boolean
ssl_mode: require # Options: disable, allow, prefer, require, verify-ca, verify-full
arch-serverless-pay-per-use--aws-lambda: # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
memory_size: 512 # Options: 128, 256, 512, 1024, 2048, 3008
timeout: 30 # Range: 3-900
architecture: x86_64 # Options: x86_64, arm64
provisioned_concurrency: 0 # Range: 0-1000
iac-cloudformation: # AWS-native IaC; deep service coverage; AWS-specific.
stack_naming_convention: project-environment-resource # Options: project-environment-resource, environment-project-resource, flat-naming, custom
change_set_enabled: true # Boolean
termination_protection: false # Boolean
drift_detection: true # Boolean
stack_policy: none # Options: none, protect-all, protect-data-resources, custom
P1:
deploy-canary: # Release to small traffic slice; monitor; gradually increase; needs good metrics and routing controls.
initial_traffic_percentage: 5 # Range: 1-50
increment_percentage: 10 # Range: 5-50
increment_interval_minutes: 15 # Range: 5-120
monitoring_window_minutes: 10 # Range: 5-60
rollback_on_error_threshold: 5 # Range: 0.1-10
success_criteria: error_rate_and_latency # Options: error_rate_only, error_rate_and_latency, error_rate_and_latency_and_saturation, custom_metrics
composes:
layered_after:
- iac-cloudformation
resilience-rate-limiting: # Protect from overload; enforce per-tenant quotas; supports fairness and cost control.
algorithm: token-bucket # Options: token-bucket, leaky-bucket, fixed-window, sliding-window
scope: per-tenant # Options: global, per-tenant, per-user, per-ip
enforcement_point: application # Options: gateway, application, distributed
quota_type: requests # Options: requests, compute-time, data-volume, cost-based
burst_allowance: enabled # Options: enabled, disabled, limited
composes:
wraps:
- api-rest-resource-oriented
resilience-circuit-breaker: # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
failure_threshold: 5 # Range: 1-20
success_threshold: 2 # Range: 1-10
timeout_duration: 60 # Range: 5-300
half_open_max_calls: 1 # Range: 1-10
fallback_strategy: fail_fast # Options: fail_fast, cached_response, default_value, alternative_service
cache-aside: # Application reads cache then DB on miss; writes update DB then invalidates/updates cache explicitly.
invalidation_strategy: ttl # Options: ttl, event-based, manual, lru
ttl_seconds: 3600 # Range: 1-86400
max_memory_mb: 512 # Range: 128-16384
cache_backend: redis # Options: redis, memcached, in-memory, hazelcast
write_strategy: write-through # Options: write-through, write-behind, invalidate-only
serialization_format: json # Options: json, msgpack, protobuf, pickle
composes:
wraps:
- db-managed-postgres
api-rest-resource-oriented: # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
pagination_style: offset # Options: offset, cursor, page_number
max_page_size: 100 # Range: 10-1000
versioning_strategy: uri # Options: uri, header, query_param, none
filtering_style: query_params # Options: query_params, json_body, graphql_like
cache_strategy: etag # Options: etag, last_modified, cache_control, none
id_format: uuid # Options: uuid, integer, slug, composite
response_envelope: false # Boolean
sec-auth-oauth2-oidc: # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
oauth_flow: authorization_code # Options: authorization_code, client_credentials, device_code, implicit
token_storage: secure_storage # Options: secure_storage, memory_only, encrypted_storage, httponly_cookie
pkce_enabled: true # Boolean
scope_strategy: minimal # Options: minimal, role_based, resource_specific
token_refresh: automatic # Options: automatic, manual, sliding_window
id_token_validation: strict # Options: strict, standard, relaxed
composes:
wraps:
- api-rest-resource-oriented
crud-single-model: # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
api_style: rest # Options: rest, graphql, rpc
validation_strategy: server-side # Options: server-side, client-side, both
soft_delete: false # Boolean
audit_logging: false # Boolean
pagination_default_size: 20 # Range: 10-100
finops-cost-allocation-tags: # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
tagging_strategy: hierarchical # Options: hierarchical, flat, hybrid
enforcement_level: required # Options: required, recommended, optional
cost_allocation_model: showback # Options: chargeback, showback, hybrid
tag_inheritance: true # Boolean
automated_tagging: true # Boolean
release-feature-flags: # Decouple deploy from release; safer experiments; needs kill switches and governance.
flag_storage: config_file # Options: config_file, database, feature_flag_service, environment_variables
evaluation_strategy: simple_boolean # Options: simple_boolean, percentage_rollout, user_targeting, multi_variate
targeting_capability: none # Options: none, user_attributes, context_based, advanced_segments
kill_switch_enabled: true # Boolean
audit_logging: false # Boolean
obs-telemetry-backend--aws-cloudwatch: # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
log_retention_days: 30 # Options: 1, 3, 7, 14, 30, 60, 90, 180, 365
xray_sampling_rate: 0.05 # Options: 0.01, 0.05, 0.1, 0.5, 1.0
obs-golden-signals: # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
latency_percentile: p95 # Options: p50, p95, p99, p99.9
error_rate_threshold: 1_percent # Options: 0.1_percent, 1_percent, 5_percent
saturation_metric: cpu_memory # Options: cpu_memory, queue_depth, connection_pool, disk_io
sli_window: 30_days # Options: 7_days, 30_days, 90_days
alert_severity_levels: critical_warning # Options: critical_only, critical_warning, critical_warning_info
composes:
layered_after:
- obs-open-telemetry-baseline
obs-open-telemetry-baseline: # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
export_backend: otlp # Options: otlp, jaeger, zipkin, prometheus, datadog, newrelic, honeycomb
trace_sampling_strategy: parent-based # Options: always-on, always-off, parent-based, trace-id-ratio
trace_sampling_rate: 1.0 # Range: 0.0-1.0
metrics_export_interval: 60 # Range: 10-300
log_correlation: true # Boolean
resource_detection: true # Boolean
propagation_format: w3c-tracecontext # Options: w3c-tracecontext, b3, jaeger, multi
composes:
co_runs_with:
- api-rest-resource-oriented
finops-budget-guardrails: # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
budget_period: monthly # Options: monthly, quarterly, annual
alert_thresholds:
- 50
- 80
- 100
enforcement_action: alert # Options: alert, prevent, throttle
tagging_strategy: mandatory # Options: mandatory, recommended, optional
policy_enforcement: soft # Options: soft, hard, audit
cost_allocation_level: project # Options: project, team, environment, service
ops-slo-error-budgets: # Define SLOs and error budgets to balance reliability and velocity.
slo_target_percentage: 99.9 # Range: 90-99.999
measurement_window_days: 30 # Options: 7, 28, 30, 90
error_budget_policy: halt-deployments # Options: halt-deployments, alert-only, slow-rollouts, require-approval
sli_type: availability # Options: availability, latency, throughput, correctness, composite
alerting_threshold_percentage: 80 # Range: 50-100
P2:
arch-egress-minimization: # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
cdn_strategy: full # Options: full, static-only, none
data_locality: regional # Options: global, regional, single-zone
cross_region_replication: minimal # Options: none, minimal, full
compression_enabled: true # Boolean
static_asset_strategy: cdn_edge # Options: cdn_edge, regional_storage, origin_only
api-versioning-header: # Version via headers/media types; keeps URLs stable; harder to debug and cache.
version_header_name: API-Version # Options: API-Version, X-API-Version, Accept-Version, Custom-Header
version_format: date-based # Options: semantic, date-based, sequential
fallback_behavior: latest-stable # Options: latest-stable, oldest-supported, reject-request
content_negotiation: false # Boolean
deprecation_policy: warning-header # Options: sunset-header, warning-header, both
gov-system-manifest: # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
manifest_path: docs/architecture/manifest.yaml
manifest_format: yaml # Options: yaml, toml, json
manifest_scope: # Options: agent-tools, agent-skills, agent-models, agent-prompts, data_sources, services, external_dependencies
- agent-tools
- agent-skills
- agent-models
- agent-prompts
pin_versions: true # Boolean
ci_validation: required # Options: required, optional, off
drift_policy: fail-build # Options: fail-build, warn-only, off
composes:
layered_after:
- iac-cloudformation
co_runs_with:
- release-feature-flags
- gov-adrs-mandatory
- ops-runbooks
P3:
ops-runbooks: # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
runbook_format: markdown # Options: markdown, wiki, structured_yaml, ticketing_system
incident_severity_levels: 4 # Options: 3, 4, 5
escalation_policy: tiered # Options: tiered, follow_the_sun, flat, hybrid
automation_integration: manual # Options: manual, semi_automated, fully_automated
review_frequency: quarterly # Options: monthly, quarterly, biannual, post_incident
gov-adrs-mandatory: # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
adr_format: madr # Options: madr, nygard, y-statements, custom
storage_location: docs/adrs # Options: docs/adrs, docs/architecture/decisions, adr, wiki
decision_threshold: significant # Options: all, significant, strategic-only
review_requirement: peer-review # Options: peer-review, architect-approval, team-consensus, none
# ─── warns and cost feasibility ───
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
# Total Patterns Selected: 24
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 52,250
# Pattern OpEx (monthly): $ 435
# Ops Team Cost (monthly): $ 0
# Total OpEx (monthly): $ 435
# Total TCO (24mo): $ 62,690
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✓ PASS
#
# ============================================================
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
language: javascript # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
platform: api # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
features:
caching: true # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (9 more)
nfr:
availability:
target: 0.999 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
latency:
p95Milliseconds: 50 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (5 more)
p99Milliseconds: 100 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (6 more)
throughput:
peak_query_per_second_read: 20 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (5 more)
peak_query_per_second_write: 10 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
assumptions:
constraints:
saas-providers: []
disallowed-saas-providers: []
ai-inference-platforms: []
disallowed-ai-inference-platforms: []
model-vendors: []
disallowed-model-vendors: []
tenantCount: 1
features:
async_messaging: false # arch-serverless--aws, deploy-canary, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
document_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (8 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
compliance:
gdpr: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
gdpr_rtbf: false
ccpa: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
hipaa: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (20 more)
sox: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
security:
auth: oauth2_oidc # arch-serverless--aws, caching-required-low-latency, api-rest-resource-oriented, ... (1 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
audit_logging: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (6 more)
operating_model:
on_call: false
deploy_freq: weekly
ops_team_size: 0
single_resource_monthly_ops_usd: 10000
amortization_months: 24
cost:
intent:
priority: minimize-opex
ceilings:
monthly_operational_usd: 500
one_time_setup_usd: 1000
preferences:
prefer_free_tier_if_possible: true # db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, caching-required-low-latency, ... (2 more)
prefer_saas_first: false
patterns:
meta:
caching-required-low-latency: {} # Policy pattern that enforces caching when P99 latency target is 100ms or below — the threshold at which direct database queries under any meaningful load are unlikely to reliably satisfy the SLO without a cache layer in front.
P0:
arch-serverless--aws: # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
compute_service: lambda # Options: lambda, fargate
api_gateway: api-gateway-http # Options: api-gateway-http, api-gateway-rest, alb, function-url
database: dynamodb # Options: dynamodb, aurora-serverless, rds-proxy
storage: s3-standard # Options: s3-standard, s3-intelligent-tiering, efs
auth_service: cognito # Options: cognito, lambda-authorizer, iam
event_bus: eventbridge # Options: eventbridge, sns-sqs, kinesis
db-managed-postgres: # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
provider: supabase # Options: supabase, neon, render, railway, digitalocean-app-platform
instance_size: small # Options: micro, small, medium, large
storage_gb: 8 # Range: 1-500
backup_retention_days: 7 # Range: 1-30
connection_pooling: true # Boolean
high_availability: false # Boolean
ssl_mode: require # Options: disable, allow, prefer, require, verify-ca, verify-full
arch-serverless-pay-per-use--aws-lambda: # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
memory_size: 512 # Options: 128, 256, 512, 1024, 2048, 3008
timeout: 30 # Range: 3-900
architecture: x86_64 # Options: x86_64, arm64
provisioned_concurrency: 0 # Range: 0-1000
iac-cloudformation: # AWS-native IaC; deep service coverage; AWS-specific.
stack_naming_convention: project-environment-resource # Options: project-environment-resource, environment-project-resource, flat-naming, custom
change_set_enabled: true # Boolean
termination_protection: false # Boolean
drift_detection: true # Boolean
stack_policy: none # Options: none, protect-all, protect-data-resources, custom
P1:
deploy-canary: # Release to small traffic slice; monitor; gradually increase; needs good metrics and routing controls.
initial_traffic_percentage: 5 # Range: 1-50
increment_percentage: 10 # Range: 5-50
increment_interval_minutes: 15 # Range: 5-120
monitoring_window_minutes: 10 # Range: 5-60
rollback_on_error_threshold: 5 # Range: 0.1-10
success_criteria: error_rate_and_latency # Options: error_rate_only, error_rate_and_latency, error_rate_and_latency_and_saturation, custom_metrics
composes:
layered_after:
- iac-cloudformation
resilience-rate-limiting: # Protect from overload; enforce per-tenant quotas; supports fairness and cost control.
algorithm: token-bucket # Options: token-bucket, leaky-bucket, fixed-window, sliding-window
scope: per-tenant # Options: global, per-tenant, per-user, per-ip
enforcement_point: application # Options: gateway, application, distributed
quota_type: requests # Options: requests, compute-time, data-volume, cost-based
burst_allowance: enabled # Options: enabled, disabled, limited
composes:
wraps:
- api-rest-resource-oriented
resilience-circuit-breaker: # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
failure_threshold: 5 # Range: 1-20
success_threshold: 2 # Range: 1-10
timeout_duration: 60 # Range: 5-300
half_open_max_calls: 1 # Range: 1-10
fallback_strategy: fail_fast # Options: fail_fast, cached_response, default_value, alternative_service
cache-aside: # Application reads cache then DB on miss; writes update DB then invalidates/updates cache explicitly.
invalidation_strategy: ttl # Options: ttl, event-based, manual, lru
ttl_seconds: 3600 # Range: 1-86400
max_memory_mb: 512 # Range: 128-16384
cache_backend: redis # Options: redis, memcached, in-memory, hazelcast
write_strategy: write-through # Options: write-through, write-behind, invalidate-only
serialization_format: json # Options: json, msgpack, protobuf, pickle
composes:
wraps:
- db-managed-postgres
api-rest-resource-oriented: # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
pagination_style: offset # Options: offset, cursor, page_number
max_page_size: 100 # Range: 10-1000
versioning_strategy: uri # Options: uri, header, query_param, none
filtering_style: query_params # Options: query_params, json_body, graphql_like
cache_strategy: etag # Options: etag, last_modified, cache_control, none
id_format: uuid # Options: uuid, integer, slug, composite
response_envelope: false # Boolean
sec-auth-oauth2-oidc: # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
oauth_flow: authorization_code # Options: authorization_code, client_credentials, device_code, implicit
token_storage: secure_storage # Options: secure_storage, memory_only, encrypted_storage, httponly_cookie
pkce_enabled: true # Boolean
scope_strategy: minimal # Options: minimal, role_based, resource_specific
token_refresh: automatic # Options: automatic, manual, sliding_window
id_token_validation: strict # Options: strict, standard, relaxed
composes:
wraps:
- api-rest-resource-oriented
crud-single-model: # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
api_style: rest # Options: rest, graphql, rpc
validation_strategy: server-side # Options: server-side, client-side, both
soft_delete: false # Boolean
audit_logging: false # Boolean
pagination_default_size: 20 # Range: 10-100
finops-cost-allocation-tags: # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
tagging_strategy: hierarchical # Options: hierarchical, flat, hybrid
enforcement_level: required # Options: required, recommended, optional
cost_allocation_model: showback # Options: chargeback, showback, hybrid
tag_inheritance: true # Boolean
automated_tagging: true # Boolean
release-feature-flags: # Decouple deploy from release; safer experiments; needs kill switches and governance.
flag_storage: config_file # Options: config_file, database, feature_flag_service, environment_variables
evaluation_strategy: simple_boolean # Options: simple_boolean, percentage_rollout, user_targeting, multi_variate
targeting_capability: none # Options: none, user_attributes, context_based, advanced_segments
kill_switch_enabled: true # Boolean
audit_logging: false # Boolean
obs-telemetry-backend--aws-cloudwatch: # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
log_retention_days: 30 # Options: 1, 3, 7, 14, 30, 60, 90, 180, 365
xray_sampling_rate: 0.05 # Options: 0.01, 0.05, 0.1, 0.5, 1.0
obs-golden-signals: # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
latency_percentile: p95 # Options: p50, p95, p99, p99.9
error_rate_threshold: 1_percent # Options: 0.1_percent, 1_percent, 5_percent
saturation_metric: cpu_memory # Options: cpu_memory, queue_depth, connection_pool, disk_io
sli_window: 30_days # Options: 7_days, 30_days, 90_days
alert_severity_levels: critical_warning # Options: critical_only, critical_warning, critical_warning_info
composes:
layered_after:
- obs-open-telemetry-baseline
obs-open-telemetry-baseline: # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
export_backend: otlp # Options: otlp, jaeger, zipkin, prometheus, datadog, newrelic, honeycomb
trace_sampling_strategy: parent-based # Options: always-on, always-off, parent-based, trace-id-ratio
trace_sampling_rate: 1.0 # Range: 0.0-1.0
metrics_export_interval: 60 # Range: 10-300
log_correlation: true # Boolean
resource_detection: true # Boolean
propagation_format: w3c-tracecontext # Options: w3c-tracecontext, b3, jaeger, multi
composes:
co_runs_with:
- api-rest-resource-oriented
finops-budget-guardrails: # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
budget_period: monthly # Options: monthly, quarterly, annual
alert_thresholds:
- 50
- 80
- 100
enforcement_action: alert # Options: alert, prevent, throttle
tagging_strategy: mandatory # Options: mandatory, recommended, optional
policy_enforcement: soft # Options: soft, hard, audit
cost_allocation_level: project # Options: project, team, environment, service
ops-slo-error-budgets: # Define SLOs and error budgets to balance reliability and velocity.
slo_target_percentage: 99.9 # Range: 90-99.999
measurement_window_days: 30 # Options: 7, 28, 30, 90
error_budget_policy: halt-deployments # Options: halt-deployments, alert-only, slow-rollouts, require-approval
sli_type: availability # Options: availability, latency, throughput, correctness, composite
alerting_threshold_percentage: 80 # Range: 50-100
P2:
arch-egress-minimization: # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
cdn_strategy: full # Options: full, static-only, none
data_locality: regional # Options: global, regional, single-zone
cross_region_replication: minimal # Options: none, minimal, full
compression_enabled: true # Boolean
static_asset_strategy: cdn_edge # Options: cdn_edge, regional_storage, origin_only
api-versioning-header: # Version via headers/media types; keeps URLs stable; harder to debug and cache.
version_header_name: API-Version # Options: API-Version, X-API-Version, Accept-Version, Custom-Header
version_format: date-based # Options: semantic, date-based, sequential
fallback_behavior: latest-stable # Options: latest-stable, oldest-supported, reject-request
content_negotiation: false # Boolean
deprecation_policy: warning-header # Options: sunset-header, warning-header, both
gov-system-manifest: # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
manifest_path: docs/architecture/manifest.yaml
manifest_format: yaml # Options: yaml, toml, json
manifest_scope: # Options: agent-tools, agent-skills, agent-models, agent-prompts, data_sources, services, external_dependencies
- agent-tools
- agent-skills
- agent-models
- agent-prompts
pin_versions: true # Boolean
ci_validation: required # Options: required, optional, off
drift_policy: fail-build # Options: fail-build, warn-only, off
composes:
layered_after:
- iac-cloudformation
co_runs_with:
- release-feature-flags
- gov-adrs-mandatory
- ops-runbooks
P3:
ops-runbooks: # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
runbook_format: markdown # Options: markdown, wiki, structured_yaml, ticketing_system
incident_severity_levels: 4 # Options: 3, 4, 5
escalation_policy: tiered # Options: tiered, follow_the_sun, flat, hybrid
automation_integration: manual # Options: manual, semi_automated, fully_automated
review_frequency: quarterly # Options: monthly, quarterly, biannual, post_incident
gov-adrs-mandatory: # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
adr_format: madr # Options: madr, nygard, y-statements, custom
storage_location: docs/adrs # Options: docs/adrs, docs/architecture/decisions, adr, wiki
decision_threshold: significant # Options: all, significant, strategic-only
review_requirement: peer-review # Options: peer-review, architect-approval, team-consensus, none
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
# Total Patterns Selected: 24
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 52,250
# Pattern OpEx (monthly): $ 435
# Ops Team Cost (monthly): $ 0
# Total OpEx (monthly): $ 435
# Total TCO (24mo): $ 62,690
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✓ PASS
#
# ============================================================
# ============================================================
# Cost Feasibility Analysis (Details)
# ============================================================
#
# Intent: minimize-opex
# Amortization: 24 months
#
# Ops team size: 0 engineers (no ops cost)
#
# Ops Team Cost Algorithm (for reference):
# Formula: ops_team_size × single_resource_monthly_ops_usd × on_call_multiplier × deploy_freq_multiplier
# Based on:
# - Google SRE Handbook (2016): On-call burden = 25-50% FTE overhead
# - DORA State of DevOps (2021): Deploy frequency impact on ops overhead
#
# Calculating costs for 24 selected patterns:
#
# PER-PATTERN COSTS:
# ────────────────────────────────────────────────────────────
#
# 1. arch-serverless--aws (match score: 35.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 2. db-managed-postgres (match score: 32.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 3. arch-serverless-pay-per-use--aws-lambda (match score: 28.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 4. deploy-canary (match score: 28.00)
# Adoption: $3,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 5. caching-required-low-latency (match score: 27.00)
# Adoption: $0.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 6. resilience-rate-limiting (match score: 27.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 7. resilience-circuit-breaker (match score: 26.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 8. cache-aside (match score: 26.00)
# Adoption: $800.0
# Monthly (min): $15.0
# Monthly (expected): $15.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $15.0
#
# 9. api-rest-resource-oriented (match score: 25.00)
# Adoption: $750.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 10. sec-auth-oauth2-oidc (match score: 23.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 11. crud-single-model (match score: 22.00)
# Adoption: $300.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 12. finops-cost-allocation-tags (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 13. arch-egress-minimization (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 14. release-feature-flags (match score: 19.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 15. api-versioning-header (match score: 16.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 16. obs-telemetry-backend--aws-cloudwatch (match score: 14.00)
# Adoption: $500.0
# Monthly (min): $20.0
# Monthly (expected): $20.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $20.0
#
# 17. obs-golden-signals (match score: 12.00)
# Adoption: $3,000.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 18. obs-open-telemetry-baseline (match score: 12.00)
# Adoption: $3,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 19. finops-budget-guardrails (match score: 10.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 20. iac-cloudformation (match score: 10.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 21. ops-runbooks (match score: 8.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 22. ops-slo-error-budgets (match score: 8.00)
# Adoption: $4,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $100.0
#
# 23. gov-system-manifest (match score: 7.00)
# Adoption: $4,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# 24. gov-adrs-mandatory (match score: 7.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# Monthly OpEx: $0.0
#
# Total Monthly OpEx: $435.0
# Monthly operational ceiling: $500 ✓ PASS
# ============================================================
graph LR n_gov_system_manifest["gov-system-manifest"] -->|co-runs with| n_gov_adrs_mandatory["gov-adrs-mandatory"] n_gov_system_manifest["gov-system-manifest"] -->|co-runs with| n_ops_runbooks["ops-runbooks"] n_gov_system_manifest["gov-system-manifest"] -->|co-runs with| n_release_feature_flags["release-feature-flags"] n_obs_open_telemetry_baseline["obs-open-telemetry-baseline"] -->|co-runs with| n_api_rest_resource_oriented["api-rest-resource-oriented"] n_deploy_canary["deploy-canary"] -->|layered after| n_iac_cloudformation["iac-cloudformation"] n_gov_system_manifest["gov-system-manifest"] -->|layered after| n_iac_cloudformation["iac-cloudformation"] n_obs_golden_signals["obs-golden-signals"] -->|layered after| n_obs_open_telemetry_baseline["obs-open-telemetry-baseline"] n_cache_aside["cache-aside"] -->|wraps| n_db_managed_postgres["db-managed-postgres"] n_resilience_rate_limiting["resilience-rate-limiting"] -->|wraps| n_api_rest_resource_oriented["api-rest-resource-oriented"] n_sec_auth_oauth2_oidc["sec-auth-oauth2-oidc"] -->|wraps| n_api_rest_resource_oriented["api-rest-resource-oriented"] n_obs_open_telemetry_baseline["obs-open-telemetry-baseline"] -.->|co-runs with| n_api_graphql_schema_first["api-graphql-schema-first"] n_deploy_canary["deploy-canary"] -.->|layered after| n_iac_bicep["iac-bicep"] n_deploy_canary["deploy-canary"] -.->|layered after| n_iac_terraform["iac-terraform"] n_gov_system_manifest["gov-system-manifest"] -.->|layered after| n_iac_terraform["iac-terraform"] n_cache_aside["cache-aside"] -.->|wraps| n_db_key_value__generic["db-key-value--generic"] n_cache_aside["cache-aside"] -.->|wraps| n_db_managed_mysql__planetscale["db-managed-mysql--planetscale"] n_cache_aside["cache-aside"] -.->|wraps| n_db_nosql_document__generic["db-nosql-document--generic"] n_resilience_circuit_breaker["resilience-circuit-breaker"] -.->|wraps| n_async_fire_and_forget["async-fire-and-forget"] n_resilience_circuit_breaker["resilience-circuit-breaker"] -.->|wraps| n_sync_request_reply_grpc["sync-request-reply-grpc"] n_resilience_circuit_breaker["resilience-circuit-breaker"] -.->|wraps| n_sync_request_reply_rest["sync-request-reply-rest"] n_resilience_rate_limiting["resilience-rate-limiting"] -.->|wraps| n_api_graphql_schema_first["api-graphql-schema-first"] n_sec_auth_oauth2_oidc["sec-auth-oauth2-oidc"] -.->|wraps| n_api_graphql_schema_first["api-graphql-schema-first"] classDef pruned stroke-dasharray:4,color:#aaa,fill:#222,stroke:#888 class n_api_graphql_schema_first pruned class n_async_fire_and_forget pruned class n_db_key_value__generic pruned class n_db_managed_mysql__planetscale pruned class n_db_nosql_document__generic pruned class n_iac_bicep pruned class n_iac_terraform pruned class n_sync_request_reply_grpc pruned class n_sync_request_reply_rest pruned
Up to now everything has been pattern fit. Step 9 commits to the
cost story so the compiler can run a full feasibility check.
Intent: optimize-tco. Ceilings: $500/month
operational, $1000 one-time setup. Operating model: 2 dedicated
ops engineers at $10k/mo loaded, on-call enabled (1.5×
multiplier per the Google SRE Book), daily deploys (1.0×
multiplier per DORA State of DevOps), 24-month CapEx amortisation.
cost: intent: priority: optimize-tco ceilings: monthly_operational_usd: 500 one_time_setup_usd: 1000 operating_model: ops_team_size: 2 single_resource_monthly_ops_usd: 10000 on_call: true deploy_freq: daily amortization_months: 24
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws
language: javascript
platform: api
features:
caching: true
nfr:
availability:
target: 0.999
latency:
p95Milliseconds: 50
p99Milliseconds: 100
throughput:
peak_query_per_second_read: 20
peak_query_per_second_write: 10
cost:
intent:
priority: optimize-tco
ceilings:
monthly_operational_usd: 500
one_time_setup_usd: 1000
operating_model:
ops_team_size: 2
single_resource_monthly_ops_usd: 10000
on_call: true
deploy_freq: daily
amortization_months: 24
The compiler runs a full cost feasibility analysis across three
buckets: Pattern OpEx (sum of each selected
pattern's estimated monthly infra cost), Ops Team
Cost (ops_team_size × single_resource_monthly_ops_usd
× on_call_multiplier × deploy_freq_multiplier), and
CapEx (one-time adoption + setup). For this spec
the breakdown lands at Pattern OpEx $435/mo, Ops Team Cost
$18,000/mo, CapEx $48,250 one-time, TCO over 24 months $490,690.
Every one of those numbers exceeds the declared ceiling — the
[high] cost_tco_exceeds_ceiling warning fires
referencing the optimize-tco intent, AND lists three
concrete levers the author can pull: raise ceilings, drop specific
high-TCO patterns (the compiler names them), or shorten the
amortisation horizon.
# ─── what the compiler FILLED IN as assumptions ───
assumptions:
constraints:
tenantCount: 1
features:
async_messaging: false # arch-serverless--aws, deploy-canary, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
document_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (8 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
compliance:
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
security:
auth: oauth2_oidc # arch-serverless--aws, caching-required-low-latency, api-rest-resource-oriented, ... (1 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
audit_logging: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (6 more)
cost:
# ─── Matched Patterns based on input spec ───
# meta = policy gates (always emitted when their feature flag is set)
# P0 = high priority — load-bearing architectural decisions
# P1 = mid priority — operational + observability + security baseline
# P2/P3 = lower priority — refinements + governance + docs
# Override priority by adding `patterns.<id>.recommended_priority: P0` to spec.
patterns:
meta: # (1 pattern)
- caching-required-low-latency # Policy pattern that enforces caching when P99 latency target is 100ms or below — the threshold at which direct database queries under any meaningful load are unlikely to reliably satisfy the SLO without a cache layer in front.
P0: # (4 patterns)
- arch-serverless--aws # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
- db-managed-postgres # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
- arch-serverless-pay-per-use--aws-lambda # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
- iac-cloudformation # AWS-native IaC; deep service coverage; AWS-specific.
P1: # (14 patterns)
- deploy-canary # Release to small traffic slice; monitor; gradually increase; needs good metrics and routing controls.
composes:
layered_after: ['iac-cloudformation']
- resilience-rate-limiting # Protect from overload; enforce per-tenant quotas; supports fairness and cost control.
composes:
wraps: ['api-rest-resource-oriented']
- resilience-circuit-breaker # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
- cache-aside # Application reads cache then DB on miss; writes update DB then invalidates/updates cache explicitly.
composes:
wraps: ['db-managed-postgres']
- api-rest-resource-oriented # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
- sec-auth-oauth2-oidc # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
composes:
wraps: ['api-rest-resource-oriented']
- crud-single-model # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
- finops-cost-allocation-tags # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
- release-feature-flags # Decouple deploy from release; safer experiments; needs kill switches and governance.
- obs-telemetry-backend--aws-cloudwatch # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
- obs-golden-signals # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
composes:
layered_after: ['obs-open-telemetry-baseline']
- obs-open-telemetry-baseline # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
composes:
co_runs_with: ['api-rest-resource-oriented']
- finops-budget-guardrails # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
- ops-slo-error-budgets # Define SLOs and error budgets to balance reliability and velocity.
P2: # (3 patterns)
- arch-egress-minimization # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
- api-versioning-header # Version via headers/media types; keeps URLs stable; harder to debug and cache.
- gov-system-manifest # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
composes:
layered_after: ['iac-cloudformation']
co_runs_with: ['release-feature-flags', 'gov-adrs-mandatory', 'ops-runbooks']
P3: # (2 patterns)
- ops-runbooks # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
- gov-adrs-mandatory # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
# ─── warns and cost feasibility ───
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: optimize-tco
# Amortization: 24 months
# Total Patterns Selected: 24
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 52,250
# Pattern OpEx (monthly): $ 435
# Ops Team Cost (monthly): $ 18,000 (2 × $10,000)
# Total OpEx (monthly): $ 18,435
# Total TCO (24mo): $ 494,690
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✗ FAIL
#
# ⚠️ WARNINGS:
# ────────────────────────────────────────────────────────────
# [high] cost_tco_exceeds_ceiling:
# Total cost of ownership over 24 months ($494690) exceeds ceiling ($13000) by $481690 (intent: optimize-tco)
#
# Suggestions:
# - Increase ceilings (capex: $1000, opex: $500)
# - Remove high-TCO patterns: ops-slo-error-budgets, deploy-canary, obs-open-telemetry-baseline
# - Reduce amortization period from 24 months
#
# ============================================================
project:
name: aws production api
domain: product-catalog
constraints:
cloud: aws # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
language: javascript # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
platform: api # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
features:
caching: true # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (9 more)
nfr:
availability:
target: 0.999 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
latency:
p95Milliseconds: 50 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (5 more)
p99Milliseconds: 100 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (6 more)
throughput:
peak_query_per_second_read: 20 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (5 more)
peak_query_per_second_write: 10 # arch-serverless--aws, db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, ... (3 more)
cost:
intent:
priority: optimize-tco
ceilings:
monthly_operational_usd: 500
one_time_setup_usd: 1000
operating_model:
ops_team_size: 2
single_resource_monthly_ops_usd: 10000
on_call: true
deploy_freq: daily
amortization_months: 24
assumptions:
constraints:
saas-providers: []
disallowed-saas-providers: []
ai-inference-platforms: []
disallowed-ai-inference-platforms: []
model-vendors: []
disallowed-model-vendors: []
tenantCount: 1
features:
async_messaging: false # arch-serverless--aws, deploy-canary, arch-serverless-pay-per-use--aws-lambda, ... (9 more)
ai_inference: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
multi_tenancy: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
batch_processing: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
distributed_transactions: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
real_time_streaming: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
vector_search: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (12 more)
document_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
key_value_store: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
graph_database: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
time_series_db: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (10 more)
oltp_workload: true # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
olap_workload: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (11 more)
cold_archive_tiering: false
nfr:
rpo_minutes: 60 # finops-budget-guardrails
rto_minutes: 60 # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (8 more)
data:
retention_days: 90
pii: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
compliance:
gdpr: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
gdpr_rtbf: false
ccpa: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
hipaa: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (20 more)
sox: false # arch-serverless--aws, db-managed-postgres, deploy-canary, ... (21 more)
consistency:
needsReadYourWrites: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
durability:
strict: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (3 more)
security:
auth: oauth2_oidc # arch-serverless--aws, caching-required-low-latency, api-rest-resource-oriented, ... (1 more)
tenant_isolation: n/a # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (2 more)
audit_logging: false # arch-serverless--aws, db-managed-postgres, caching-required-low-latency, ... (6 more)
cost:
preferences:
prefer_free_tier_if_possible: true # db-managed-postgres, arch-serverless-pay-per-use--aws-lambda, caching-required-low-latency, ... (2 more)
prefer_saas_first: false
patterns:
meta:
caching-required-low-latency: {} # Policy pattern that enforces caching when P99 latency target is 100ms or below — the threshold at which direct database queries under any meaningful load are unlikely to reliably satisfy the SLO without a cache layer in front.
P0:
arch-serverless--aws: # Structure the system as stateless, event-driven function handlers backed by AWS managed services (Lambda, API Gateway, DynamoDB, S3, SQS). No persistent servers — each function activates on demand, executes, and terminates. The architectural commitment is to build around events and AWS-managed primitives rather than long-running processes.
compute_service: lambda # Options: lambda, fargate
api_gateway: api-gateway-http # Options: api-gateway-http, api-gateway-rest, alb, function-url
database: dynamodb # Options: dynamodb, aurora-serverless, rds-proxy
storage: s3-standard # Options: s3-standard, s3-intelligent-tiering, efs
auth_service: cognito # Options: cognito, lambda-authorizer, iam
event_bus: eventbridge # Options: eventbridge, sns-sqs, kinesis
db-managed-postgres: # Use low-ops managed Postgres DBaaS providers (e.g., Supabase and managed cloud Postgres offerings) to reduce DB operations overhead; validate quotas, compliance, and scale limits.
provider: supabase # Options: supabase, neon, render, railway, digitalocean-app-platform
instance_size: small # Options: micro, small, medium, large
storage_gb: 8 # Range: 1-500
backup_retention_days: 7 # Range: 1-30
connection_pooling: true # Boolean
high_availability: false # Boolean
ssl_mode: require # Options: disable, allow, prefer, require, verify-ca, verify-full
arch-serverless-pay-per-use--aws-lambda: # Cost optimisation pattern that eliminates idle infrastructure spend by running workloads on AWS Lambda’s per-invocation billing model. Well-suited to bursty or unpredictable workloads where provisioned servers would sit idle most of the time; accepts Lambda cold-start latency as the trade-off.
memory_size: 512 # Options: 128, 256, 512, 1024, 2048, 3008
timeout: 30 # Range: 3-900
architecture: x86_64 # Options: x86_64, arm64
provisioned_concurrency: 0 # Range: 0-1000
iac-cloudformation: # AWS-native IaC; deep service coverage; AWS-specific.
stack_naming_convention: project-environment-resource # Options: project-environment-resource, environment-project-resource, flat-naming, custom
change_set_enabled: true # Boolean
termination_protection: false # Boolean
drift_detection: true # Boolean
stack_policy: none # Options: none, protect-all, protect-data-resources, custom
P1:
deploy-canary: # Release to small traffic slice; monitor; gradually increase; needs good metrics and routing controls.
initial_traffic_percentage: 5 # Range: 1-50
increment_percentage: 10 # Range: 5-50
increment_interval_minutes: 15 # Range: 5-120
monitoring_window_minutes: 10 # Range: 5-60
rollback_on_error_threshold: 5 # Range: 0.1-10
success_criteria: error_rate_and_latency # Options: error_rate_only, error_rate_and_latency, error_rate_and_latency_and_saturation, custom_metrics
composes:
layered_after:
- iac-cloudformation
resilience-rate-limiting: # Protect from overload; enforce per-tenant quotas; supports fairness and cost control.
algorithm: token-bucket # Options: token-bucket, leaky-bucket, fixed-window, sliding-window
scope: per-tenant # Options: global, per-tenant, per-user, per-ip
enforcement_point: application # Options: gateway, application, distributed
quota_type: requests # Options: requests, compute-time, data-volume, cost-based
burst_allowance: enabled # Options: enabled, disabled, limited
composes:
wraps:
- api-rest-resource-oriented
resilience-circuit-breaker: # Stop calls to failing dependencies; half-open probing; fallback. Prevents cascading failures.
failure_threshold: 5 # Range: 1-20
success_threshold: 2 # Range: 1-10
timeout_duration: 60 # Range: 5-300
half_open_max_calls: 1 # Range: 1-10
fallback_strategy: fail_fast # Options: fail_fast, cached_response, default_value, alternative_service
cache-aside: # Application reads cache then DB on miss; writes update DB then invalidates/updates cache explicitly.
invalidation_strategy: ttl # Options: ttl, event-based, manual, lru
ttl_seconds: 3600 # Range: 1-86400
max_memory_mb: 512 # Range: 128-16384
cache_backend: redis # Options: redis, memcached, in-memory, hazelcast
write_strategy: write-through # Options: write-through, write-behind, invalidate-only
serialization_format: json # Options: json, msgpack, protobuf, pickle
composes:
wraps:
- db-managed-postgres
api-rest-resource-oriented: # REST API designed around resources (nouns) manipulated via standard HTTP verbs (GET, POST, PUT, DELETE, PATCH). Resources are identified by stable URLs, responses are cacheable by default, and pagination/filtering are expressed as query parameters. Simpler tooling and stronger HTTP cache semantics than GraphQL; well-suited to public APIs and CRUD-heavy domains.
pagination_style: offset # Options: offset, cursor, page_number
max_page_size: 100 # Range: 10-1000
versioning_strategy: uri # Options: uri, header, query_param, none
filtering_style: query_params # Options: query_params, json_body, graphql_like
cache_strategy: etag # Options: etag, last_modified, cache_control, none
id_format: uuid # Options: uuid, integer, slug, composite
response_envelope: false # Boolean
sec-auth-oauth2-oidc: # Use OAuth2 flows with OIDC identity tokens; standardized claims; delegated auth support.
oauth_flow: authorization_code # Options: authorization_code, client_credentials, device_code, implicit
token_storage: secure_storage # Options: secure_storage, memory_only, encrypted_storage, httponly_cookie
pkce_enabled: true # Boolean
scope_strategy: minimal # Options: minimal, role_based, resource_specific
token_refresh: automatic # Options: automatic, manual, sliding_window
id_token_validation: strict # Options: strict, standard, relaxed
composes:
wraps:
- api-rest-resource-oriented
crud-single-model: # Simple CRUD on one canonical model; lowest complexity; best for straightforward domains.
api_style: rest # Options: rest, graphql, rpc
validation_strategy: server-side # Options: server-side, client-side, both
soft_delete: false # Boolean
audit_logging: false # Boolean
pagination_default_size: 20 # Range: 10-100
finops-cost-allocation-tags: # Tagging/labeling strategy for per-tenant/product cost allocation and chargeback/showback.
tagging_strategy: hierarchical # Options: hierarchical, flat, hybrid
enforcement_level: required # Options: required, recommended, optional
cost_allocation_model: showback # Options: chargeback, showback, hybrid
tag_inheritance: true # Boolean
automated_tagging: true # Boolean
release-feature-flags: # Decouple deploy from release; safer experiments; needs kill switches and governance.
flag_storage: config_file # Options: config_file, database, feature_flag_service, environment_variables
evaluation_strategy: simple_boolean # Options: simple_boolean, percentage_rollout, user_targeting, multi_variate
targeting_capability: none # Options: none, user_attributes, context_based, advanced_segments
kill_switch_enabled: true # Boolean
audit_logging: false # Boolean
obs-telemetry-backend--aws-cloudwatch: # AWS-native observability backend using CloudWatch Metrics, CloudWatch Logs, and AWS X-Ray for distributed tracing. Zero infrastructure to operate; deeply integrated with all AWS services.
log_retention_days: 30 # Options: 1, 3, 7, 14, 30, 60, 90, 180, 365
xray_sampling_rate: 0.05 # Options: 0.01, 0.05, 0.1, 0.5, 1.0
obs-golden-signals: # Monitor latency/traffic/errors/saturation; define SLIs and alert policies.
latency_percentile: p95 # Options: p50, p95, p99, p99.9
error_rate_threshold: 1_percent # Options: 0.1_percent, 1_percent, 5_percent
saturation_metric: cpu_memory # Options: cpu_memory, queue_depth, connection_pool, disk_io
sli_window: 30_days # Options: 7_days, 30_days, 90_days
alert_severity_levels: critical_warning # Options: critical_only, critical_warning, critical_warning_info
composes:
layered_after:
- obs-open-telemetry-baseline
obs-open-telemetry-baseline: # Standardize traces/metrics/log correlation via OpenTelemetry; export to vendor or OSS backends.
export_backend: otlp # Options: otlp, jaeger, zipkin, prometheus, datadog, newrelic, honeycomb
trace_sampling_strategy: parent-based # Options: always-on, always-off, parent-based, trace-id-ratio
trace_sampling_rate: 1.0 # Range: 0.0-1.0
metrics_export_interval: 60 # Range: 10-300
log_correlation: true # Boolean
resource_detection: true # Boolean
propagation_format: w3c-tracecontext # Options: w3c-tracecontext, b3, jaeger, multi
composes:
co_runs_with:
- api-rest-resource-oriented
finops-budget-guardrails: # Implement budgets, alerts, tagging, and policy-as-code guardrails to enforce cost ceilings and prevent runaway spend.
budget_period: monthly # Options: monthly, quarterly, annual
alert_thresholds:
- 50
- 80
- 100
enforcement_action: alert # Options: alert, prevent, throttle
tagging_strategy: mandatory # Options: mandatory, recommended, optional
policy_enforcement: soft # Options: soft, hard, audit
cost_allocation_level: project # Options: project, team, environment, service
ops-slo-error-budgets: # Define SLOs and error budgets to balance reliability and velocity.
slo_target_percentage: 99.9 # Range: 90-99.999
measurement_window_days: 30 # Options: 7, 28, 30, 90
error_budget_policy: halt-deployments # Options: halt-deployments, alert-only, slow-rollouts, require-approval
sli_type: availability # Options: availability, latency, throughput, correctness, composite
alerting_threshold_percentage: 80 # Range: 50-100
P2:
arch-egress-minimization: # Reduce cloud egress cost by co-locating compute and data, using CDNs, and avoiding cross-region data flows.
cdn_strategy: full # Options: full, static-only, none
data_locality: regional # Options: global, regional, single-zone
cross_region_replication: minimal # Options: none, minimal, full
compression_enabled: true # Boolean
static_asset_strategy: cdn_edge # Options: cdn_edge, regional_storage, origin_only
api-versioning-header: # Version via headers/media types; keeps URLs stable; harder to debug and cache.
version_header_name: API-Version # Options: API-Version, X-API-Version, Accept-Version, Custom-Header
version_format: date-based # Options: semantic, date-based, sequential
fallback_behavior: latest-stable # Options: latest-stable, oldest-supported, reject-request
content_negotiation: false # Boolean
deprecation_policy: warning-header # Options: sunset-header, warning-header, both
gov-system-manifest: # Pin and govern the inventory of components (agent-tools, agent-skills, agent-models, agent-prompts, services, data sources, external dependencies) the system depends on at a declared manifest path; CI validates on every PR and drift between manifest and built system fails the build.
manifest_path: docs/architecture/manifest.yaml
manifest_format: yaml # Options: yaml, toml, json
manifest_scope: # Options: agent-tools, agent-skills, agent-models, agent-prompts, data_sources, services, external_dependencies
- agent-tools
- agent-skills
- agent-models
- agent-prompts
pin_versions: true # Boolean
ci_validation: required # Options: required, optional, off
drift_policy: fail-build # Options: fail-build, warn-only, off
composes:
layered_after:
- iac-cloudformation
co_runs_with:
- release-feature-flags
- gov-adrs-mandatory
- ops-runbooks
P3:
ops-runbooks: # Standard runbooks for incidents and routine ops; reduces MTTR and on-call stress.
runbook_format: markdown # Options: markdown, wiki, structured_yaml, ticketing_system
incident_severity_levels: 4 # Options: 3, 4, 5
escalation_policy: tiered # Options: tiered, follow_the_sun, flat, hybrid
automation_integration: manual # Options: manual, semi_automated, fully_automated
review_frequency: quarterly # Options: monthly, quarterly, biannual, post_incident
gov-adrs-mandatory: # Record architecture decisions and tradeoffs; improves continuity; keep lightweight.
adr_format: madr # Options: madr, nygard, y-statements, custom
storage_location: docs/adrs # Options: docs/adrs, docs/architecture/decisions, adr, wiki
decision_threshold: significant # Options: all, significant, strategic-only
review_requirement: peer-review # Options: peer-review, architect-approval, team-consensus, none
# ============================================================
# Cost Feasibility Analysis (Summary)
# ============================================================
#
# Intent: optimize-tco
# Amortization: 24 months
# Total Patterns Selected: 24
#
# COST BREAKDOWN:
# ────────────────────────────────────────────────────────────
# Total CapEx (one-time): $ 52,250
# Pattern OpEx (monthly): $ 435
# Ops Team Cost (monthly): $ 18,000 (2 × $10,000)
# Total OpEx (monthly): $ 18,435
# Total TCO (24mo): $ 494,690
#
# COST CEILINGS:
# ────────────────────────────────────────────────────────────
# CapEx Ceiling: $ 1,000 ✗ FAIL
# OpEx Ceiling (monthly): $ 500 ✗ FAIL
#
# ⚠️ WARNINGS:
# ────────────────────────────────────────────────────────────
# [high] cost_tco_exceeds_ceiling:
# Total cost of ownership over 24 months ($494690) exceeds ceiling ($13000) by $481690 (intent: optimize-tco)
#
# Suggestions:
# - Increase ceilings (capex: $1000, opex: $500)
# - Remove high-TCO patterns: ops-slo-error-budgets, deploy-canary, obs-open-telemetry-baseline
# - Reduce amortization period from 24 months
#
# ============================================================
# ============================================================
# Cost Feasibility Analysis (Details)
# ============================================================
#
# Intent: optimize-tco
# Amortization: 24 months
#
# Ops Team Cost Breakdown:
# Base: 2 engineers × $10,000/month = $20,000
# On-call multiplier: 1.5x (on-call burden)
# Deploy frequency multiplier: 0.6x (deploy_freq: daily, high automation)
# Adjusted ops cost: $20,000 × 1.5 × 0.6 = $18,000/month
#
# Deploy Frequency Options (DORA State of DevOps):
# on-demand: 0.5x (very high automation)
# daily: 0.6x (high automation)
# weekly: 0.8x (moderate automation)
# biweekly: 0.9x (manual processes)
# monthly: 1.0x (very manual)
# quarterly: 1.1x (extremely manual)
#
#
# Ops Team Cost Algorithm (for reference):
# Formula: ops_team_size × single_resource_monthly_ops_usd × on_call_multiplier × deploy_freq_multiplier
# Based on:
# - Google SRE Handbook (2016): On-call burden = 25-50% FTE overhead
# - DORA State of DevOps (2021): Deploy frequency impact on ops overhead
#
# Calculating costs for 24 selected patterns:
#
# PER-PATTERN COSTS:
# ────────────────────────────────────────────────────────────
#
# 1. arch-serverless--aws (match score: 35.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $3,500.0 + ($0.0 × 24) = $3,500.0
#
# 2. db-managed-postgres (match score: 32.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $1,200.0 + ($0.0 × 24) = $1,200.0
#
# 3. arch-serverless-pay-per-use--aws-lambda (match score: 28.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $1,500.0 + ($0.0 × 24) = $1,500.0
#
# 4. deploy-canary (match score: 28.00)
# Adoption: $3,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $3,500.0 + ($100.0 × 24) = $5,900.0
#
# 5. caching-required-low-latency (match score: 27.00)
# Adoption: $0.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $0.0 + ($0.0 × 24) = $0.0
#
# 6. resilience-rate-limiting (match score: 27.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $1,500.0 + ($0.0 × 24) = $1,500.0
#
# 7. resilience-circuit-breaker (match score: 26.00)
# Adoption: $1,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $1,500.0 + ($0.0 × 24) = $1,500.0
#
# 8. cache-aside (match score: 26.00)
# Adoption: $800.0
# Monthly (min): $15.0
# Monthly (expected): $15.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $800.0 + ($15.0 × 24) = $1,160.0
#
# 9. api-rest-resource-oriented (match score: 25.00)
# Adoption: $750.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $750.0 + ($0.0 × 24) = $750.0
#
# 10. sec-auth-oauth2-oidc (match score: 23.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $3,500.0 + ($0.0 × 24) = $3,500.0
#
# 11. crud-single-model (match score: 22.00)
# Adoption: $300.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $300.0 + ($0.0 × 24) = $300.0
#
# 12. finops-cost-allocation-tags (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $2,500.0 + ($0.0 × 24) = $2,500.0
#
# 13. arch-egress-minimization (match score: 21.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $2,500.0 + ($0.0 × 24) = $2,500.0
#
# 14. release-feature-flags (match score: 19.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $2,000.0 + ($0.0 × 24) = $2,000.0
#
# 15. api-versioning-header (match score: 16.00)
# Adoption: $1,200.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $1,200.0 + ($0.0 × 24) = $1,200.0
#
# 16. obs-telemetry-backend--aws-cloudwatch (match score: 14.00)
# Adoption: $500.0
# Monthly (min): $20.0
# Monthly (expected): $20.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $500.0 + ($20.0 × 24) = $980.0
#
# 17. obs-golden-signals (match score: 12.00)
# Adoption: $3,000.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $3,000.0 + ($100.0 × 24) = $5,400.0
#
# 18. obs-open-telemetry-baseline (match score: 12.00)
# Adoption: $3,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $3,500.0 + ($100.0 × 24) = $5,900.0
#
# 19. finops-budget-guardrails (match score: 10.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $2,500.0 + ($0.0 × 24) = $2,500.0
#
# 20. iac-cloudformation (match score: 10.00)
# Adoption: $3,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $3,500.0 + ($0.0 × 24) = $3,500.0
#
# 21. ops-runbooks (match score: 8.00)
# Adoption: $2,500.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $2,500.0 + ($0.0 × 24) = $2,500.0
#
# 22. ops-slo-error-budgets (match score: 8.00)
# Adoption: $4,500.0
# Monthly (min): $100.0
# Monthly (expected): $100.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $4,500.0 + ($100.0 × 24) = $6,900.0
#
# 23. gov-system-manifest (match score: 7.00)
# Adoption: $4,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $4,000.0 + ($0.0 × 24) = $4,000.0
#
# 24. gov-adrs-mandatory (match score: 7.00)
# Adoption: $2,000.0
# Monthly (min): $0.0
# Monthly (expected): $0.0
# Ops cost: $0 (no infrastructure)
# ──────────────────────────────────────
# TCO (24mo): $2,000.0 + ($0.0 × 24) = $2,000.0
#
# Total TCO (24mo): $494,690.0
# Monthly operational ceiling: $500 ✗ FAIL
# One-time setup ceiling: $1,000 ✗ FAIL
# ============================================================
operating_model, the compiler would have
defaulted ops_team_size: 0 — silently
understating real TCO. Declaring it surfaced the mismatch.The author makes three decisions before approving the architecture.
resilience-rate-limiting,
resilience-circuit-breaker, and
cache-aside all sit at P1 by registry default —
a reasonable generic recommendation. But for this team they
are load-bearing: rate limiting and the circuit breaker
protect the database and downstream services from overload
and cascading failures; the cache-aside read path is
on-the-critical-path for the p99 ≤ 100ms latency budget.
The team bumps all three to P0 via the bucket-grouped
patterns.P0.<pid>: {} form (empty body
means priority-only override; registry
defaultConfig is backfilled into the approved
architecture).gov-system-manifest.manifest_scope
to the non-agentic classes. The pattern's
registry default pins the four agentic-flavoured classes
(agent-tools, agent-skills, agent-models,
agent-prompts) because the most common adopter is
an agentic system. This spec is non-agentic — the four
agentic classes don't apply. The team swaps to
[data_sources, services, external_dependencies]
so the CI drift check exercises the right surface (e.g.
the AWS managed-Postgres endpoint, the third-party
licensing SaaS, the inbound webhook providers). The
override rides on the same bucket-grouped form:
patterns.P2.gov-system-manifest.manifest_scope:
[data_sources, services, external_dependencies].compiling-architecture skill's promotion contract:
EVERY key under assumptions.* is lifted into the
top-level spec body so the result has no assumptions
block. The # STATUS: APPROVED comment header is
prepended at the top. The footer below the panel verifies the
handoff: re-compiling the promoted architecture must exit
0 (idempotent). Approver name and date are
anonymised placeholders here; in practice the author fills them
in at commit time.# STATUS: APPROVED # Approved by: <architect-on-record> # Approved at: <YYYY-MM-DD> # # This header is consumed by skills/implementing-architecture/SKILL.md # to verify the architecture is handoff-ready. Recompilation of the # underlying spec invalidates this approval — fresh review required. project: name: aws production api domain: product-catalog constraints: cloud: aws language: javascript platform: api features: caching: true async_messaging: false ai_inference: false multi_tenancy: false batch_processing: false distributed_transactions: false real_time_streaming: false vector_search: false document_store: false key_value_store: false graph_database: false time_series_db: false oltp_workload: true olap_workload: false cold_archive_tiering: false saas-providers: [] disallowed-saas-providers: [] ai-inference-platforms: [] disallowed-ai-inference-platforms: [] model-vendors: [] disallowed-model-vendors: [] tenantCount: 1 cost: intent: priority: optimize-tco ceilings: monthly_operational_usd: 20000 one_time_setup_usd: 50000 preferences: prefer_free_tier_if_possible: true prefer_saas_first: false operating_model: ops_team_size: 2 single_resource_monthly_ops_usd: 10000 on_call: true deploy_freq: daily amortization_months: 24 nfr: availability: target: 0.999 latency: p95Milliseconds: 50 p99Milliseconds: 100 throughput: peak_query_per_second_read: 20 peak_query_per_second_write: 10 rpo_minutes: 60 rto_minutes: 60 data: retention_days: 90 pii: false compliance: gdpr: false gdpr_rtbf: false ccpa: false hipaa: false sox: false consistency: needsReadYourWrites: false durability: strict: false security: auth: oauth2_oidc tenant_isolation: n/a audit_logging: false patterns: meta: caching-required-low-latency: {} P0: arch-serverless--aws: compute_service: lambda api_gateway: api-gateway-http database: dynamodb storage: s3-standard auth_service: cognito event_bus: eventbridge db-managed-postgres: provider: supabase instance_size: small storage_gb: 8 backup_retention_days: 7 connection_pooling: true high_availability: false ssl_mode: require arch-serverless-pay-per-use--aws-lambda: memory_size: 512 timeout: 30 architecture: x86_64 provisioned_concurrency: 0 reserved_concurrency: null iac-cloudformation: stack_naming_convention: project-environment-resource change_set_enabled: true termination_protection: false drift_detection: true stack_policy: none resilience-rate-limiting: algorithm: token-bucket scope: per-tenant enforcement_point: application quota_type: requests burst_allowance: enabled resilience-circuit-breaker: failure_threshold: 5 success_threshold: 2 timeout_duration: 60 half_open_max_calls: 1 fallback_strategy: fail_fast cache-aside: invalidation_strategy: ttl ttl_seconds: 3600 max_memory_mb: 512 cache_backend: redis write_strategy: write-through serialization_format: json P1: deploy-canary: initial_traffic_percentage: 5 increment_percentage: 10 increment_interval_minutes: 15 monitoring_window_minutes: 10 rollback_on_error_threshold: 5 success_criteria: error_rate_and_latency composes: layered_after: - iac-cloudformation api-rest-resource-oriented: pagination_style: offset max_page_size: 100 versioning_strategy: uri filtering_style: query_params cache_strategy: etag id_format: uuid response_envelope: false sec-auth-oauth2-oidc: oauth_flow: authorization_code token_storage: secure_storage pkce_enabled: true scope_strategy: minimal token_refresh: automatic id_token_validation: strict composes: wraps: - api-rest-resource-oriented crud-single-model: api_style: rest validation_strategy: server-side soft_delete: false audit_logging: false pagination_default_size: 20 finops-cost-allocation-tags: tagging_strategy: hierarchical enforcement_level: required cost_allocation_model: showback tag_inheritance: true automated_tagging: true release-feature-flags: flag_storage: config_file evaluation_strategy: simple_boolean targeting_capability: none kill_switch_enabled: true audit_logging: false obs-telemetry-backend--aws-cloudwatch: log_retention_days: 30 xray_sampling_rate: 0.05 obs-golden-signals: latency_percentile: p95 error_rate_threshold: 1_percent saturation_metric: cpu_memory sli_window: 30_days alert_severity_levels: critical_warning composes: layered_after: - obs-open-telemetry-baseline obs-open-telemetry-baseline: export_backend: otlp trace_sampling_strategy: parent-based trace_sampling_rate: 1.0 metrics_export_interval: 60 log_correlation: true resource_detection: true propagation_format: w3c-tracecontext composes: co_runs_with: - api-rest-resource-oriented finops-budget-guardrails: budget_period: monthly alert_thresholds: - 50 - 80 - 100 enforcement_action: alert tagging_strategy: mandatory policy_enforcement: soft cost_allocation_level: project ops-slo-error-budgets: slo_target_percentage: 99.9 measurement_window_days: 30 error_budget_policy: halt-deployments sli_type: availability alerting_threshold_percentage: 80 P2: arch-egress-minimization: cdn_strategy: full data_locality: regional cross_region_replication: minimal compression_enabled: true static_asset_strategy: cdn_edge api-versioning-header: version_header_name: API-Version version_format: date-based fallback_behavior: latest-stable content_negotiation: false deprecation_policy: warning-header gov-system-manifest: manifest_scope: - data_sources - services - external_dependencies manifest_path: docs/architecture/manifest.yaml manifest_format: yaml pin_versions: true ci_validation: required drift_policy: fail-build P3: ops-runbooks: runbook_format: markdown incident_severity_levels: 4 escalation_policy: tiered automation_integration: manual review_frequency: quarterly gov-adrs-mandatory: adr_format: madr storage_location: docs/adrs decision_threshold: significant review_requirement: peer-review
<app-repo>/docs/architecture/architecture.yaml
and commit it.
skills/implementing-architecture/SKILL.md reads
from that path. Recompilation of the underlying spec
invalidates the approval header — fresh review required.The Progressive-Refinement walkthrough above ends with an approved architecture. Several facilities of the compiler and the adjacent skills were intentionally out of scope:
optimize-tco. The lever choice is an architectural
decision the compiler intentionally leaves to the author.features.caching: true). The harder case is when
a hard NFR (e.g. 10ms p95 latency, or a very high availability
requirement combined with on-prem deployment) cannot be
satisfied by any registered pattern at all — the compiler
exits 1 and names the pattern whose
supports_nfr gate failed. The agentic demo's
Step 2 shows a structurally similar rejection (n8n + multi-agent
hosting mismatch).patterns.<bucket>.<pid>:
{} form. The same shape with a non-empty body overrides
per-pattern defaultConfig field values (e.g.
patterns.P0.cache-aside.invalidation_strategy:
explicit) or NFR contributions. Useful when registry
defaults don't fit your data-handling, latency, or operational
requirements. See
the agentic
demo's step 9 for the priority-promotion + config-override
combined example with rich BEFORE/AFTER rendering.rejected-patterns.yaml side file —
produced by verbose mode (-v), lists every pattern
the compiler considered AND dropped, with per-pattern reasoning
(which activation gate failed, which conflict fired, which NFR
contribution couldn't be met). The flip side of "why was this
pattern selected?" — equally useful for debugging.skills/implementing-architecture skill reads it
and scaffolds the project. The composes graph (visible in
Step 8 above) is the implementing agent's primary signal for
build order and runtime layering. See
skills/implementing-architecture/SKILL.md for the
workflow that picks up where this demo ends.compiling-architecture skill documents how to
read an existing prototype's choices into the spec before
compiling. See its "Brownfield" section for the protocol.Every selected pattern carries
reference_design_url + reference_developer_doc_url
fields under patterns/<pid>.json pointing at the
canonical product / SDK docs for that technology — the implementing
agent uses these to ground its scaffolding decisions.