Daily Shopify Bulk Sync
Overview
Section titled “Overview”Staff-facing dashboards require fresh sales metrics derived from order headers and order line items. The integration is loosely coupled and asynchronous, implemented via Cron + CF Workflow steps.
Goal: Execute a multi-step CF Workflow that upserts order_header / order_line_item → refreshes product → writes measurement.sales and measurement.performance to R2 for performance tracking and analytics.
API endpoints
Section titled “API endpoints”| Method | Path | Spec | Description |
|---|---|---|---|
| POST | /api/performance-metrics/sync | analytics.yaml | Manual trigger (starts same CF Workflow) |
| GET | /api/ingest/runs | analytics.yaml | List recent sync telemetry records with status |
| GET | /orders | order.yaml | List orders (staff, paginated) |
| GET | /orders/{order_id} | order.yaml | Single order detail |
| GET | /analytics/commerce/overview | analytics.yaml | Commerce summary KPIs for date range |
| GET | /analytics/pipelines | analytics.yaml | Pipeline health (SYNC_TELEMETRY stream) |
Schema tables
Section titled “Schema tables”| Table | Module | Role |
|---|---|---|
order_header | Order | Write: upserted from Shopify bulk data |
order_line_item | Order | Write: upserted from Shopify bulk data |
product | Product | Write: refreshed after order upsert |
Infrastructure
Section titled “Infrastructure”| Resource | Type | Purpose |
|---|---|---|
| CF Cron Trigger | Cron | Daily 6am UTC trigger |
ingest | Worker (CF Workflow) | Orchestrates multi-step sync via Durable Objects |
SYNC_TELEMETRY | Pipeline | Execution audit trail to R2 |
COMMERCE_TELEMETRY | Pipeline | Order/fulfillment event streaming |
| R2 bucket | Object storage | JSONL archives, measurement data, telemetry |
| Durable Object | State store | Watermark tracking |
Participants
Section titled “Participants”| Actor | Role |
|---|---|
| CF Cron Trigger | Fires daily at 6am UTC — scheduled trigger |
| CF Workflow | Orchestrates multi-step sync via Durable Objects |
| Shopify Admin API | Source of order data |
| PlanetScale | Database for order records, metrics |
| R2 | Object storage for JSONL archives |
Preconditions
Section titled “Preconditions”- Shopify Admin API access token configured as a Wrangler secret
- PlanetScale reachable via Hyperdrive
- Durable Object contains a valid watermark (hard floor: 128 days back)
Main success scenario (CF Workflow steps)
Section titled “Main success scenario (CF Workflow steps)”- Step: Initiate — Cron fires → triggers CF Workflow run
- Workflow records telemetry to
sync.telemetryon R2 (run_id,process_initiated_at, step=initiated) - Step: Request — Workflow initiates Shopify
bulkOperationRunQuerywith orders/line-items query - Step: Poll — Workflow polls Shopify with
bulk_operation_idevery 16 seconds, up to 128 attempts (~34 min max), untilCOMPLETEDorFAILED - Step: Download — On
COMPLETED, Workflow downloads JSONL results and stores in R2 (stored file for audit/replay) - Step: Parse + Upsert — Workflow parses JSONL in chunks and upserts into
order_headerandorder_line_item - Step: Transform — Workflow refreshes
product, writesmeasurement.salesandmeasurement.performanceto R2 - Step: Finalize — Workflow updates watermark in Durable Object and writes completion to
sync.telemetryon R2
sequenceDiagram
participant Cron as CF Cron
participant WF as CF Workflow
participant Shopify as Shopify API
participant DB as PlanetScale
participant R2 as R2
Cron->>WF: trigger workflow run
WF->>R2: write sync.telemetry(initiated)
WF->>Shopify: bulkOperationRunQuery
WF->>Shopify: poll bulk status (16s interval)
Note over WF,Shopify: retry poll if pending (max 128 attempts)
alt COMPLETED
WF->>Shopify: download JSONL
WF->>R2: put(JSONL archive)
WF->>DB: upsert order_header (chunked)
WF->>DB: upsert order_line_item (chunked)
WF->>DB: refresh product
WF->>R2: write measurement.sales + measurement.performance
WF->>R2: write sync.telemetry(completed) + update DO watermark
else FAILED/RATE-LIMITED
WF->>WF: retry with backoff
end
Exception flows
Section titled “Exception flows”| Error condition | Recovery |
|---|---|
| Bulk op already running | Delay retry — only one concurrent bulk query on API version 2024-04 |
| Shopify rate limit / transient failure | Queue retry + delay (up to 12 hours) |
| PlanetScale row limit exceeded | Reduce batch size, resume from last committed chunk |
| JSONL parse error | Mark execution as failed, store error in sync.telemetry on R2 |
Workflow parameters
Section titled “Workflow parameters”Workflow trigger payload
{ "run_id": "uuid", "initiator": "cron|user", "requested_at": "ISO8601"}Performance SLA
Section titled “Performance SLA”Daily incremental CF Workflow completes in < 15 minutes (async). Poll budget: 128 attempts x 16s = ~34 min hard ceiling. Chunk DB writes to avoid 20s transaction timeout.
Acceptance criteria
Section titled “Acceptance criteria”- PASS:
sync.telemetryon R2 shows completed execution;measurement.performanceon R2 updated; sample products show correct 7/30-day sales metrics - FAIL: Execution ends in
failed; watermark regressions; duplicatemeasurement.salesentries after re-run
Watermark strategy
Section titled “Watermark strategy”The incremental sync uses the watermark in the Durable Object to query only orders modified since the last successful run.
- Hard floor: 128 days. If the watermark is older than 128 days (or missing), it resets to
NOW() - 128 days. This bounds the maximum data volume per incremental run and prevents unbounded Shopify bulk queries. - Advance on success only: the watermark advances to
MAX(updated_at)from the fetched batch only after all Steps complete. A failed run leaves the watermark unchanged — the next run retries from the same position.
Full backfill
Section titled “Full backfill”Historical data loads beyond the 128-day window use a separate Durable Object workflow rather than the daily sync path. The backfill workflow:
- Accepts a custom date range (no floor constraint)
- Runs as its own workflow execution with a distinct
job_type(FULL_BACKFILL_EXECUTE) - Uses the same parse/upsert/transform Steps but with larger chunk budgets
- Does not advance the daily watermark — the two are independent
- Can run concurrently with daily sync since Shopify allows one bulk op per app, but the backfill workflow handles the “bulk op already running” error with longer retry delays
Summary (7 buckets)
Section titled “Summary (7 buckets)”| Bucket | Answer |
|---|---|
| WHAT | Bulk data acquisition workflow (order data from Shopify) |
| WHO | CF Workflow, Shopify API, Cron Trigger |
| WHEN | Daily 6am UTC (cron); on-demand via POST /api/performance-metrics/sync |
| WHERE | CF Workflow / Durable Objects, R2 bucket, PlanetScale tables |
| HOW WE KNOW | sync.telemetry on R2, JSONL archives (R2 objects), watermark in Durable Object |
| HOW IT IS | Execution status: initiated → polling → downloading → upserting → completed/failed |
| WHY | Keep dashboard data fresh; scheduled data currency |