Skip to content

Observability

Every workflow execution should be auditable and every error should be detectable without manual checking. This aligns with the Arcadia method’s emphasis on providing meaningful information for decision-makers.

The pipeline records every ProcedureExecution through two mechanisms:

  1. CF Workflows execution logs — Step-level tracking with automatic retries, durations, and error details
  2. sync.telemetry on R2 — Long-term Iceberg table via the SYNC_TELEMETRY Pipeline stream, retaining procedure completions, record counts, watermark positions, and durations
ObservableSourceOntological note
Run identityWorkflow execution IDIAO Identifier
DurationWorkflow step timestampsBFO Temporal Region boundaries
Current stepWorkflow execution statePKO ExecutionStatus
Volume metricsSync telemetry events (orders_count, line_items_count)IAO Measurement Datum (ratio)
ErrorsWorkflow error state + sync telemetry error eventsPKO error handling
Raw archiveR2 object key for JSONL IBEIAO concretized_by reference

Raw JSONL is archived in R2 for traceability and replay.

Cloudflare Workers provide structured logging via console.log() + Workers Logs:

console.log(JSON.stringify({
level: 'info',
event: 'procedure_execution_completed',
run_id: runId,
orders: count,
duration_ms: Date.now() - startTime,
}));

Logpush forwards logs to external destinations (e.g., Datadog, S3, R2) for retention beyond Workers’ built-in log viewer.

GET /api/health reports service availability:

{
"status": "ok",
"database": "connected",
"last_sync": {
"workflow_id": "uuid",
"completed_at": "ISO8601",
"status": "completed"
},
"pipelines": {
"commerce_telemetry": "healthy",
"publication_telemetry": "healthy",
"sync_telemetry": "healthy"
}
}
MetricSourceAlert threshold
Daily sync durationCF Workflow execution logs> 15 minutes
Sync not completedWorkflow execution stalenessNo completed execution in 26 hours
Workflow retry countWorkflow execution metrics> 10 retries/hour
Pipeline throughputPipeline stream metricsSustained drop > 50% from baseline
API error rateWorkers analytics> 5% 5xx in 5-minute window
Measurement freshnessmeasurement.performance Iceberg table metadata> 2 hours since last write
Real-time counter lagWorkers Analytics Engine> 5 minutes behind

Alerts are derived from metrics and surfaced via Cloudflare notifications or external webhook:

  • Critical: Workflow execution failure, Pipeline stream error, database unreachable
  • Warning: Workflow duration approaching limit, elevated error rate, stale measurement data
  • Info: Successful Workflow completion, large batch processed

Every request gets a unique ID for correlation:

app.use('*', async (c, next) => {
const requestId = c.req.header('cf-ray') || crypto.randomUUID();
c.set('requestId', requestId);
c.header('X-Request-Id', requestId);
await next();
});

All log entries within a request include the request ID for end-to-end tracing.