Skip to content

Observability

Every ProcedureExecution (PKO) should be auditable and every IssueOccurrence should be detectable without manual checking. This aligns with the Arcadia method’s emphasis on providing meaningful information for decision-makers.

The pipeline records every ProcedureExecution in procedure_execution_record:

FieldPurposeOntological note
run_idUnique identifier for correlationIAO Identifier
process_initiated_at / process_completed_atDuration trackingBFO Temporal Region boundaries
stepCurrent PKO Step (for resume/debug)PKO ExecutionStatus
orders_count / line_items_countVolume Measurement DataIAO Measurement Datum (ratio)
error_textIssueOccurrence detailsPKO error handling
file_pathR2 bearer_entity_key for raw JSONL IBEIAO concretized_by reference

Raw JSONL is archived in R2 (IBE store) for traceability and replay.

Cloudflare Workers provide structured logging via console.log() + Workers Logs:

console.log(JSON.stringify({
level: 'info',
event: 'procedure_execution_completed',
run_id: runId,
orders: count,
duration_ms: Date.now() - startTime,
}));

Logpush forwards logs to external destinations (e.g., Datadog, S3, R2) for retention beyond Workers’ built-in log viewer.

GET /api/health reports EngineeredSystem availability:

{
"status": "ok",
"database": "connected",
"last_sync": {
"run_id": "uuid",
"process_completed_at": "ISO8601",
"status": "completed"
},
"queue_depth": 0
}
MetricSourceAlert threshold
Daily sync durationprocedure_execution_record> 15 minutes
Sync not completedprocedure_execution_record stalenessNo completed execution in 26 hours
Queue retry countQueue metrics> 10 retries/hour
DLQ growthQueue DLQAny message in DLQ (exhausted FallbackSteps)
API error rateWorkers analytics> 5% 5xx in 5-minute window
Mart freshnessperformance_measurement_dataset.last_refreshed> 2 hours stale

Alerts are derived from metrics and surfaced via Cloudflare notifications or external webhook:

  • Critical: ProcedureExecution failure, DLQ messages (exhausted FallbackSteps), database unreachable
  • Warning: Execution duration approaching limit, elevated error rate, stale Measurement Dataset
  • Info: Successful ProcedureExecution completion, large batch processed

Every request gets a unique ID for correlation:

app.use('*', async (c, next) => {
const requestId = c.req.header('cf-ray') || crypto.randomUUID();
c.set('requestId', requestId);
c.header('X-Request-Id', requestId);
await next();
});

All log entries within a request include the request ID for end-to-end tracing.