Testing Locally

The framework is designed so that "test the agent" doesn't mean "spin up a full LLM." Use the mock provider for fast, deterministic checks and reach for live models only when behavior depends on the model.

Mock model

fh run ask --model mock/test-model --question "hi"

mock/test-model returns deterministic responses and respects typed result schemas. Combine with snapshot testing or schema-only assertions in your test suite.

Vitest example

import { describe, expect, it } from 'vitest';
import { runAgent } from '@fabric-harness/node';

describe('ask agent', () => {
  it('returns a string', async () => {
    const { result } = await runAgent({
      agent: 'ask',
      payload: { question: 'hello' },
      model: 'mock/test-model',
    });
    expect(typeof result).toBe('string');
  });
});

Doctor

fh doctor --tools           # binary checks
fh doctor --live --model openai/gpt-5.5  # one round-trip

Dev server

fh dev starts the same routes the deployed Node target uses, so you can hit POST /agents/:agent/:id with curl and verify webhook payloads:

fh dev --port 4000
curl -X POST -H 'Content-Type: application/json' \
  -d '{"question":"What is Temporal?"}' \
  http://localhost:4000/agents/ask/run-001

Recipes from `examples/`

The repo's examples/ directory is the canonical reference for how to test each capability:

examples/hello-world — basic metadata agents and real model invocation.
examples/with-tools — built-in tools.
examples/with-skill — skill loading and typed results.
examples/with-task — durable task lifecycle and artifacts.
examples/with-approval — approval-gated commands.
examples/with-docker — Docker sandbox basics.
examples/with-temporal — Temporal worker integration.
examples/with-config — central config including SQLite session storage.
examples/with-postgres-store — Postgres session/artifact storage.
examples/data-analyst — Docker-backed CSV analysis with artifacts.
examples/issue-triage-ci — controlled CI pilot for read-only GitHub issue triage.

What to assert in tests

For metadata agents, the most useful assertions are:

Schema shape. Output validates against the declared output schema.
Tool/command scope. No unexpected commands ran.
Artifacts. Expected artifacts were published with the right content type.
Metrics. Token / call counts stay within bounds for a given task.
Idempotence. Re-running the same prompt produces compatible output (when using the mock model or fixed seed).