Observability
Philosophy
Section titled “Philosophy”Every workflow execution should be auditable and every error should be detectable without manual checking. This aligns with the Arcadia method’s emphasis on providing meaningful information for decision-makers.
ProcedureExecution-level auditability
Section titled “ProcedureExecution-level auditability”The pipeline records every ProcedureExecution through two mechanisms:
- CF Workflows execution logs — Step-level tracking with automatic retries, durations, and error details
sync.telemetryon R2 — Long-term Iceberg table via theSYNC_TELEMETRYPipeline stream, retaining procedure completions, record counts, watermark positions, and durations
| Observable | Source | Ontological note |
|---|---|---|
| Run identity | Workflow execution ID | IAO Identifier |
| Duration | Workflow step timestamps | BFO Temporal Region boundaries |
| Current step | Workflow execution state | PKO ExecutionStatus |
| Volume metrics | Sync telemetry events (orders_count, line_items_count) | IAO Measurement Datum (ratio) |
| Errors | Workflow error state + sync telemetry error events | PKO error handling |
| Raw archive | R2 object key for JSONL IBE | IAO concretized_by reference |
Raw JSONL is archived in R2 for traceability and replay.
Workers logging
Section titled “Workers logging”Cloudflare Workers provide structured logging via console.log() + Workers Logs:
console.log(JSON.stringify({ level: 'info', event: 'procedure_execution_completed', run_id: runId, orders: count, duration_ms: Date.now() - startTime,}));Logpush forwards logs to external destinations (e.g., Datadog, S3, R2) for retention beyond Workers’ built-in log viewer.
Health endpoint
Section titled “Health endpoint”GET /api/health reports service availability:
{ "status": "ok", "database": "connected", "last_sync": { "workflow_id": "uuid", "completed_at": "ISO8601", "status": "completed" }, "pipelines": { "commerce_telemetry": "healthy", "publication_telemetry": "healthy", "sync_telemetry": "healthy" }}Key metrics
Section titled “Key metrics”| Metric | Source | Alert threshold |
|---|---|---|
| Daily sync duration | CF Workflow execution logs | > 15 minutes |
| Sync not completed | Workflow execution staleness | No completed execution in 26 hours |
| Workflow retry count | Workflow execution metrics | > 10 retries/hour |
| Pipeline throughput | Pipeline stream metrics | Sustained drop > 50% from baseline |
| API error rate | Workers analytics | > 5% 5xx in 5-minute window |
| Measurement freshness | measurement.performance Iceberg table metadata | > 2 hours since last write |
| Real-time counter lag | Workers Analytics Engine | > 5 minutes behind |
Alerting
Section titled “Alerting”Alerts are derived from metrics and surfaced via Cloudflare notifications or external webhook:
- Critical: Workflow execution failure, Pipeline stream error, database unreachable
- Warning: Workflow duration approaching limit, elevated error rate, stale measurement data
- Info: Successful Workflow completion, large batch processed
Request tracing
Section titled “Request tracing”Every request gets a unique ID for correlation:
app.use('*', async (c, next) => { const requestId = c.req.header('cf-ray') || crypto.randomUUID(); c.set('requestId', requestId); c.header('X-Request-Id', requestId); await next();});All log entries within a request include the request ID for end-to-end tracing.