MIT Open Source antidrift

workflow

Stage-based pipeline engine with checkpoint resume. Run pipelines over thousands of items. If it crashes, it picks up where it left off.

Node.js · Python · Go
per-item concurrency
fs checkpoint store
resume after crash
0 runtime deps
$ npm install @probeo/workflow

What it does

workflow runs a sequence of stages over a list of items. Each item moves through the stages independently, with configurable concurrency at each stage. Progress is checkpointed to the filesystem after each item completes a stage. If the process crashes or is killed, it resumes from the last checkpoint — not from the beginning.

Retries use exponential backoff. Failures are logged per-item, not pipeline-wide. A single item failing doesn't stop the run.

Usage

Node.js
import { workflow } from '@probeo/workflow';

const pipeline = workflow({
  name: 'content-pipeline',
  checkpoint: './.checkpoints',
  stages: [
    {
      name: 'fetch',
      concurrency: 10,
      run: async (item) => {
        const html = await fetch(item.url).then(r => r.text());
        return { ...item, html };
      },
    },
    {
      name: 'analyze',
      concurrency: 5,
      run: async (item) => {
        const signals = await extractSignals(item.html);
        return { ...item, signals };
      },
    },
    {
      name: 'store',
      concurrency: 20,
      run: async (item) => {
        await db.upsert(item);
      },
    },
  ],
});

const items = await getUrls(); // e.g. 50,000 pages
await pipeline.run(items);
// Crashes partway through? Run again — it skips completed items.

Key behaviors

  • Per-item checkpoints. Each item's stage completion is written to disk. Resuming re-reads the checkpoint and skips already-completed work.
  • Per-stage concurrency. IO-bound stages (fetch) can run at higher concurrency than CPU-bound stages (analyze).
  • Exponential backoff. Failed items are retried with configurable delay and max attempts before being marked as errored.
  • Zero runtime dependencies. The checkpoint store is plain JSON on disk. No Redis, no database, no broker required.

Used in Probeo

workflow was extracted from Probeo, where it runs in production handling multi-stage pipeline orchestration.