Architecture Decision Records
Architecture Decision Records (ADRs) capture the “why” behind choices. This practice aligns with the Arcadia method’s emphasis on “capitalisation” — retaining decisions and justifications for long-term traceability.
ADR-001: Cloudflare Access as sole auth plane
Section titled “ADR-001: Cloudflare Access as sole auth plane”Context: The current system has two incompatible auth systems — Passport.js sessions (volatile, in-memory) and Supabase Auth (JWT, frontend only). Staff access is internal-only.
Decision: Use Cloudflare Access with Google SSO as the single authentication layer.
Consequences:
- No auth UI in the app — Access gate handles login before the app loads
- Every request carries
Cf-Access-Jwt-Assertionheader; Worker middleware validates and extracts identity - No RBAC in v1 — all authenticated staff have equal access (deferred per SOW)
- Service-to-service calls (cron, queues) use Access service tokens
ADR-002: GraphQL-first Shopify integration
Section titled “ADR-002: GraphQL-first Shopify integration”Context: Shopify states the REST Admin API is “legacy” as of October 2024 and recommends GraphQL. The current codebase uses REST in 6 independent implementations.
Decision: Use GraphQL Admin API as the primary integration, with bulk operations for high-volume data sync.
Consequences:
- Bulk operations are async (poll-based), requiring queue orchestration
- Concurrency limits apply: one bulk query per shop on API version 2024-04 (five on 2026-01+)
- REST retained only for endpoints without GraphQL equivalents, isolated behind the shared Shopify client
ADR-003: No stored procedures or materialized views
Section titled “ADR-003: No stored procedures or materialized views”Context: Current system uses PostgreSQL RPCs (get_tag_categories(), refresh_tag_performance_aggregate(), etc.) and materialized views. The rewrite moves all business logic to application code for portability and testability.
Decision: Move all business logic into application code. Replace materialized views with DuckDB aggregation over R2 Iceberg tables (measurement.performance, measurement.sales) triggered by CF Workflow cron jobs.
Consequences:
- All RPC logic migrated to TypeScript in Workers
- Mart tables are regular tables, rebuilt by background jobs (workflow executions)
last_refreshedtimestamp tracks freshness- More app code, but fully testable and portable
- Cross-reference: naming conventions and relations are documentation, not runtime enforcement
ADR-004: Foreign keys as documentation, not enforcement
Section titled “ADR-004: Foreign keys as documentation, not enforcement”Context: PlanetScale supports foreign keys but recommends carefully weighing trade-offs — constraints increase locking and can create friction in high-concurrency workloads and online schema changes.
Decision: Define relationships in Drizzle schema (for type safety and documentation) but do not enforce FK constraints at the database level.
Consequences:
- Application code enforces referential integrity
- Background jobs include periodic orphan detection
- Schema changes are simpler and non-blocking
- Risk: orphaned rows possible if app bugs bypass checks
- Cross-reference: relations (like the denotation link between assets and products) are documented in the schema naming and this specification, not enforced by database constraints
ADR-005: Queue-per-concern with typed discriminator
Section titled “ADR-005: Queue-per-concern with typed discriminator”Context: The system has 8+ distinct background job types. Options: one queue per job type, or a single queue with typed messages.
Decision: Single queue (ingest-queue) with job_type discriminated union. Each message includes a job_type field that routes to the appropriate handler.
Consequences:
- Simpler infrastructure (one queue binding)
- Type-safe dispatch via Zod discriminated union
- All jobs share retry/DLQ configuration
- Trade-off: noisy-neighbor risk if one job type floods the queue (mitigated by batch size config)
- Job types use ontological naming:
PROCEDURE_EXECUTION_INITIATE,ICE_INGEST,DOCUMENT_ARTIFACT_GENERATE,MEASUREMENT_DATASET_REFRESH, etc. (see logical architecture job contracts)
ADR-006: Cloudflare Pipelines for streaming event ingestion {#adr-006}
Section titled “ADR-006: Cloudflare Pipelines for streaming event ingestion {#adr-006}”Context: The system needs to capture commerce events (order lifecycle, payment state changes, fulfillment updates) and publication telemetry (clicks, impressions, views) for analytical querying. The current ingest worker uses a manual queue → R2 archival pattern. Cloudflare launched Pipelines (open beta, Arroyo-based) — a managed streaming ingestion service with SQL transforms that lands data as Apache Iceberg tables on R2 Data Catalog.
Decision: Use Cloudflare Pipelines for streaming event ingestion. Workers emit enriched events via env.STREAM.send() bindings; Pipelines applies lightweight SQL reshaping and lands Iceberg/Parquet tables on R2 Data Catalog, partitioned by day.
Consequences:
- Replaces the manual queue → R2 archival pattern for event data with managed streaming
- Orchestrated multi-step procedures (Shopify Bulk Sync, mart refresh) remain as Workers + Queues — Pipelines is stateless and cannot do API orchestration, polling, or watermark management
- R2 Data Catalog provides a standard Apache Iceberg REST catalog interface, queryable by DuckDB, Spark, Trino, ClickHouse, or Cloudflare’s own R2 SQL
- Pipelines SQL transforms are limited to stateless row-level operations (filter, reshape, cast, regex) — business logic stays in Worker code
- Open beta constraints: 20 streams/account, 5 MB/s per stream, 1 MB max payload per request
- Zero cost during beta beyond standard R2 storage; future pricing based on volume processed/transformed/delivered
ADR-007: Worker-mediated enrichment before stream push {#adr-007}
Section titled “ADR-007: Worker-mediated enrichment before stream push {#adr-007}”Context: Events need enrichment with data from PlanetScale (resolve product details, look up publisher context, update order status) before landing on R2. Cloudflare Pipelines can only do stateless SQL transforms — it cannot read from PlanetScale or any external database.
Decision: Workers perform all PG reads/writes (domain operations, enrichment) before pushing enriched events to Pipeline streams. The Pipeline’s SQL transform handles only final reshaping, not business logic.
Consequences:
- The Worker is the orchestration layer between PG and Pipelines —
event → Worker (read/write PG, enrich) → env.STREAM.send(enriched_event) → Pipeline → R2 - Events arrive on R2 already enriched with denormalized context (product names, publisher slugs, program details) — analytical queries don’t need to join back to PG
- Pipeline SQL transforms are thin: field selection, type coercion, timestamp normalization — not lookups or joins
- Trade-off: Worker latency increases slightly (PG round-trip before stream push), but enrichment happens once at ingest rather than on every analytical query
- Cross-reference: aligns with ADR-005 — the ingest queue already routes typed messages through Workers; Pipelines adds a second output path alongside PG writes
ADR-008: Three telemetry streams {#adr-008}
Section titled “ADR-008: Three telemetry streams {#adr-008}”Context: Events from the commerce platform vary in volume, shape, and downstream consumers. A single stream would mix high-volume telemetry (clicks, impressions) with low-volume operational events (sync completions), complicating Pipeline SQL transforms and Iceberg table partitioning.
Decision: Three dedicated Pipeline streams, each with its own sink and Iceberg table:
| Stream | Events | Volume | Iceberg table |
|---|---|---|---|
COMMERCE_TELEMETRY | order_created, order_completed, payment_captured, payment_refunded, fulfillment_shipped, return_requested | Medium (order throughput) | commerce.telemetry |
PUBLICATION_TELEMETRY | click, impression, view, engagement, conversion — enriched with publisher/item/program context | High (every visitor interaction) | publication.telemetry |
SYNC_TELEMETRY | procedure completions with record counts, watermark positions, durations | Low (operational audit trail) | sync.telemetry |
Consequences:
- Each stream has independent Pipeline SQL transforms tuned to its event shape
PUBLICATION_TELEMETRYis the highest-volume stream — may skip PG entirely for raw engagement events, writing only to Pipelines (with Workers Analytics Engine handling real-time counters)SYNC_TELEMETRYdoubles as an operational audit log — procedure execution history lands on R2 for long-term retention beyond PlanetScale- Iceberg tables are independently compactable and partitionable — high-volume publication telemetry can use hourly partitions while low-volume sync telemetry stays daily
- Stream count (3) is well within the beta limit of 20 streams per account
- Naming convention: stream bindings use
SCREAMING_SNAKE, Iceberg tables usenamespace.telemetry
ADR-009: No pipeline state tables in PG — delegate to Cloudflare durable execution {#adr-009}
Section titled “ADR-009: No pipeline state tables in PG — delegate to Cloudflare durable execution {#adr-009}”Context: The Astro doc site schema included procedure_execution_state (watermark/cursor tracking) and procedure_execution_record (ingest run history) as PlanetScale tables. These exist solely to support restartable, multi-step pipeline procedures (Shopify Bulk Sync). Cloudflare now offers durable execution primitives — Workflows (step-based orchestration with automatic retries and replay) and Durable Objects (stateful singletons with persistent storage).
Decision: Do not add pipeline orchestration tables to PlanetScale. Delegate execution state, watermark management, and run history to Cloudflare Workflows (for multi-step procedures) and Durable Objects (for cursor/watermark state).
Consequences:
- PlanetScale stays purely domain/transactional — no operational infrastructure tables
- Watermark state lives in Durable Object storage (key-value, persisted automatically)
- Run history and step tracking are handled by Workflow execution logs
- Pipeline audit trail (completions, errors, durations) is captured by the
SYNC_TELEMETRYPipeline stream (ADR-008) and lands on R2 as Iceberg tables for long-term retention - Trade-off: pipeline state is less queryable via SQL — compensated by the sync telemetry Iceberg tables which DuckDB can query
- Eliminates 2 tables from the PG schema that were pure infrastructure, not domain entities
Arcadia method
Section titled “Arcadia method”This specification adapts the Arcadia (ARChitecture Analysis and Design Integrated Approach) method — a model-based systems engineering approach that defines viewpoints flowing from operational need to build strategy:
| Viewpoint | This spec’s mapping |
|---|---|
| Operational Analysis (OA) | Scenarios — stakeholders, workflows, operational goals |
| System Need Analysis (SA) | System Need Analysis — system boundary, external interfaces, non-functional constraints |
| Logical Architecture (LA) | Logical Architecture — tech-neutral modules and contracts |
| Physical Architecture (PA) | Physical Architecture — CF Workers, PlanetScale, R2 bindings |
| Building Strategy (BS) | Delivery Plan — phases, verification, cutover |
Arcadia’s core value for this rewrite is the discipline of maintaining a shared architectural reference across engineering levels — ensuring that operational scenarios trace to logical components, which trace to physical infrastructure, which trace to verification artifacts.
Consistent naming conventions add a complementary discipline: ensuring that every entity, relation, and process follows established patterns, preventing naming drift and design erosion over time.