FB Frank Brano Gomez
Menu

Designing Resilient Workflow Engines

Patterns for building workflow engines that survive retries, partial failures, and long-running state.

1 min read
  • workflows
  • orchestration
  • reliability

Workflow engines fail in predictable ways. The trick is designing for those failures up front instead of patching them in production.

Idempotency First

Every step in a workflow must be safely re-runnable. If a step charges a customer or sends an email, it needs an idempotency key:

interface StepExecution {
  readonly stepId: string;
  readonly idempotencyKey: string;
  readonly attempt: number;
}

Durable State

State transitions belong in a database, not in memory. A workflow that lives only in RAM dies with the process.

Explicit Compensation

When step four fails, steps one through three may need undoing. Model compensation as first-class steps, not as exception handlers.

Placeholder post — replace with a real article before launch.