Architecture Decision Records

Architecture Decision Records (ADRs) capture the “why” behind choices. This practice aligns with the Arcadia method’s emphasis on “capitalisation” — retaining decisions and justifications for long-term traceability.

ADR-001: Cloudflare Access as sole auth plane

Context: The current system has two incompatible auth systems — Passport.js sessions (volatile, in-memory) and Supabase Auth (JWT, frontend only). Staff access is internal-only.

Decision: Use Cloudflare Access with Google SSO as the single authentication layer.

Consequences:

No auth UI in the app — Access gate handles login before the app loads
Every request carries Cf-Access-Jwt-Assertion header; Worker middleware validates and extracts identity
No RBAC in v1 — all authenticated staff have equal access (deferred per SOW)
Service-to-service calls (cron, queues) use Access service tokens

ADR-002: GraphQL-first Shopify integration

Context: Shopify states the REST Admin API is “legacy” as of October 2024 and recommends GraphQL. The current codebase uses REST in 6 independent implementations.

Decision: Use GraphQL Admin API as the primary integration, with bulk operations for high-volume data sync.

Consequences:

Bulk operations are async (poll-based), requiring queue orchestration
Concurrency limits apply: one bulk query per shop on API version 2024-04 (five on 2026-01+)
REST retained only for endpoints without GraphQL equivalents, isolated behind the shared Shopify client

ADR-003: No stored procedures or materialized views

Context: Current system uses PostgreSQL RPCs (get_tag_categories(), refresh_tag_performance_aggregate(), etc.) and materialized views. The rewrite moves all business logic to application code for portability and testability.

Decision: Move all business logic into application code. Replace materialized views with DuckDB aggregation over R2 Iceberg tables (measurement.performance, measurement.sales) triggered by CF Workflow cron jobs.

Consequences:

All RPC logic migrated to TypeScript in Workers
Mart tables are regular tables, rebuilt by background jobs (workflow executions)
last_refreshed timestamp tracks freshness
More app code, but fully testable and portable
Cross-reference: naming conventions and relations are documentation, not runtime enforcement

ADR-004: Foreign keys as documentation, not enforcement

Context: PlanetScale supports foreign keys but recommends carefully weighing trade-offs — constraints increase locking and can create friction in high-concurrency workloads and online schema changes.

Decision: Define relationships in Drizzle schema (for type safety and documentation) but do not enforce FK constraints at the database level.

Consequences:

Application code enforces referential integrity
Background jobs include periodic orphan detection
Schema changes are simpler and non-blocking
Risk: orphaned rows possible if app bugs bypass checks
Cross-reference: relations (like the denotation link between assets and products) are documented in the schema naming and this specification, not enforced by database constraints

ADR-005: Queue-per-concern with typed discriminator

Context: The system has 8+ distinct background job types. Options: one queue per job type, or a single queue with typed messages.

Decision: Single queue (ingest-queue) with job_type discriminated union. Each message includes a job_type field that routes to the appropriate handler.

Consequences:

Simpler infrastructure (one queue binding)
Type-safe dispatch via Zod discriminated union
All jobs share retry/DLQ configuration
Trade-off: noisy-neighbor risk if one job type floods the queue (mitigated by batch size config)
Job types use ontological naming: PROCEDURE_EXECUTION_INITIATE, ICE_INGEST, DOCUMENT_ARTIFACT_GENERATE, MEASUREMENT_DATASET_REFRESH, etc. (see logical architecture job contracts)

ADR-006: Cloudflare Pipelines for streaming event ingestion {#adr-006}

Context: The system needs to capture commerce events (order lifecycle, payment state changes, fulfillment updates) and publication telemetry (clicks, impressions, views) for analytical querying. The current ingest worker uses a manual queue → R2 archival pattern. Cloudflare launched Pipelines (open beta, Arroyo-based) — a managed streaming ingestion service with SQL transforms that lands data as Apache Iceberg tables on R2 Data Catalog.

Decision: Use Cloudflare Pipelines for streaming event ingestion. Workers emit enriched events via env.STREAM.send() bindings; Pipelines applies lightweight SQL reshaping and lands Iceberg/Parquet tables on R2 Data Catalog, partitioned by day.

Consequences:

Replaces the manual queue → R2 archival pattern for event data with managed streaming
Orchestrated multi-step procedures (Shopify Bulk Sync, mart refresh) remain as Workers + Queues — Pipelines is stateless and cannot do API orchestration, polling, or watermark management
R2 Data Catalog provides a standard Apache Iceberg REST catalog interface, queryable by DuckDB, Spark, Trino, ClickHouse, or Cloudflare’s own R2 SQL
Pipelines SQL transforms are limited to stateless row-level operations (filter, reshape, cast, regex) — business logic stays in Worker code
Open beta constraints: 20 streams/account, 5 MB/s per stream, 1 MB max payload per request
Zero cost during beta beyond standard R2 storage; future pricing based on volume processed/transformed/delivered

ADR-007: Worker-mediated enrichment before stream push {#adr-007}

Context: Events need enrichment with data from PlanetScale (resolve product details, look up publisher context, update order status) before landing on R2. Cloudflare Pipelines can only do stateless SQL transforms — it cannot read from PlanetScale or any external database.

Decision: Workers perform all PG reads/writes (domain operations, enrichment) before pushing enriched events to Pipeline streams. The Pipeline’s SQL transform handles only final reshaping, not business logic.

Consequences:

The Worker is the orchestration layer between PG and Pipelines — event → Worker (read/write PG, enrich) → env.STREAM.send(enriched_event) → Pipeline → R2
Events arrive on R2 already enriched with denormalized context (product names, publisher slugs, program details) — analytical queries don’t need to join back to PG
Pipeline SQL transforms are thin: field selection, type coercion, timestamp normalization — not lookups or joins
Trade-off: Worker latency increases slightly (PG round-trip before stream push), but enrichment happens once at ingest rather than on every analytical query
Cross-reference: aligns with ADR-005 — the ingest queue already routes typed messages through Workers; Pipelines adds a second output path alongside PG writes

ADR-008: Three telemetry streams {#adr-008}

Context: Events from the commerce platform vary in volume, shape, and downstream consumers. A single stream would mix high-volume telemetry (clicks, impressions) with low-volume operational events (sync completions), complicating Pipeline SQL transforms and Iceberg table partitioning.

Decision: Three dedicated Pipeline streams, each with its own sink and Iceberg table:

Stream	Events	Volume	Iceberg table
`COMMERCE_TELEMETRY`	order_created, order_completed, payment_captured, payment_refunded, fulfillment_shipped, return_requested	Medium (order throughput)	`commerce.telemetry`
`PUBLICATION_TELEMETRY`	click, impression, view, engagement, conversion — enriched with publisher/item/program context	High (every visitor interaction)	`publication.telemetry`
`SYNC_TELEMETRY`	procedure completions with record counts, watermark positions, durations	Low (operational audit trail)	`sync.telemetry`

Consequences:

Each stream has independent Pipeline SQL transforms tuned to its event shape
PUBLICATION_TELEMETRY is the highest-volume stream — may skip PG entirely for raw engagement events, writing only to Pipelines (with Workers Analytics Engine handling real-time counters)
SYNC_TELEMETRY doubles as an operational audit log — procedure execution history lands on R2 for long-term retention beyond PlanetScale
Iceberg tables are independently compactable and partitionable — high-volume publication telemetry can use hourly partitions while low-volume sync telemetry stays daily
Stream count (3) is well within the beta limit of 20 streams per account
Naming convention: stream bindings use SCREAMING_SNAKE, Iceberg tables use namespace.telemetry

ADR-009: No pipeline state tables in PG — delegate to Cloudflare durable execution {#adr-009}

Context: The Astro doc site schema included procedure_execution_state (watermark/cursor tracking) and procedure_execution_record (ingest run history) as PlanetScale tables. These exist solely to support restartable, multi-step pipeline procedures (Shopify Bulk Sync). Cloudflare now offers durable execution primitives — Workflows (step-based orchestration with automatic retries and replay) and Durable Objects (stateful singletons with persistent storage).

Decision: Do not add pipeline orchestration tables to PlanetScale. Delegate execution state, watermark management, and run history to Cloudflare Workflows (for multi-step procedures) and Durable Objects (for cursor/watermark state).

Consequences:

PlanetScale stays purely domain/transactional — no operational infrastructure tables
Watermark state lives in Durable Object storage (key-value, persisted automatically)
Run history and step tracking are handled by Workflow execution logs
Pipeline audit trail (completions, errors, durations) is captured by the SYNC_TELEMETRY Pipeline stream (ADR-008) and lands on R2 as Iceberg tables for long-term retention
Trade-off: pipeline state is less queryable via SQL — compensated by the sync telemetry Iceberg tables which DuckDB can query
Eliminates 2 tables from the PG schema that were pure infrastructure, not domain entities

Arcadia method

This specification adapts the Arcadia (ARChitecture Analysis and Design Integrated Approach) method — a model-based systems engineering approach that defines viewpoints flowing from operational need to build strategy:

Viewpoint	This spec’s mapping
Operational Analysis (OA)	Scenarios — stakeholders, workflows, operational goals
System Need Analysis (SA)	System Need Analysis — system boundary, external interfaces, non-functional constraints
Logical Architecture (LA)	Logical Architecture — tech-neutral modules and contracts
Physical Architecture (PA)	Physical Architecture — CF Workers, PlanetScale, R2 bindings
Building Strategy (BS)	Delivery Plan — phases, verification, cutover

Arcadia’s core value for this rewrite is the discipline of maintaining a shared architectural reference across engineering levels — ensuring that operational scenarios trace to logical components, which trace to physical infrastructure, which trace to verification artifacts.

Consistent naming conventions add a complementary discipline: ensuring that every entity, relation, and process follows established patterns, preventing naming drift and design erosion over time.