Building Agents
Testing Locally
Mock model, doctor, dev server, examples, and assertions.
The framework is designed so that "test the agent" doesn't mean "spin up a full LLM." Use the mock provider for fast, deterministic checks and reach for live models only when behavior depends on the model.
Mock model
fh run ask --model mock/test-model --question "hi"mock/test-model returns deterministic responses and respects typed result schemas. Combine with snapshot testing or schema-only assertions in your test suite.
Vitest example
import { describe, expect, it } from 'vitest';
import { runAgent } from '@fabric-harness/node';
describe('ask agent', () => {
it('returns a string', async () => {
const { result } = await runAgent({
agent: 'ask',
payload: { question: 'hello' },
model: 'mock/test-model',
});
expect(typeof result).toBe('string');
});
});Doctor
fh doctor --tools # binary checks
fh doctor --live --model openai/gpt-5.5 # one round-tripDev server
fh dev starts the same routes the deployed Node target uses, so you can hit POST /agents/:agent/:id with curl and verify webhook payloads:
fh dev --port 4000
curl -X POST -H 'Content-Type: application/json' \
-d '{"question":"What is Temporal?"}' \
http://localhost:4000/agents/ask/run-001Recipes from examples/
The repo's examples/ directory is the canonical reference for how to test each capability:
examples/hello-world— basic metadata agents and real model invocation.examples/with-tools— built-in tools.examples/with-skill— skill loading and typed results.examples/with-task— durable task lifecycle and artifacts.examples/with-approval— approval-gated commands.examples/with-docker— Docker sandbox basics.examples/with-temporal— Temporal worker integration.examples/with-config— central config including SQLite session storage.examples/with-postgres-store— Postgres session/artifact storage.examples/data-analyst— Docker-backed CSV analysis with artifacts.examples/issue-triage-ci— controlled CI pilot for read-only GitHub issue triage.
What to assert in tests
For metadata agents, the most useful assertions are:
- Schema shape. Output validates against the declared output schema.
- Tool/command scope. No unexpected commands ran.
- Artifacts. Expected artifacts were published with the right content type.
- Metrics. Token / call counts stay within bounds for a given task.
- Idempotence. Re-running the same prompt produces compatible output (when using the mock model or fixed seed).