FabricFabricHarness
Building Agents

Testing Locally

Mock model, doctor, dev server, examples, and assertions.

The framework is designed so that "test the agent" doesn't mean "spin up a full LLM." Use the mock provider for fast, deterministic checks and reach for live models only when behavior depends on the model.

Mock model

fh run ask --model openai/gpt-5.5 --question "hi"

openai/gpt-5.5 returns deterministic responses and respects typed result schemas. Combine with snapshot testing or schema-only assertions in your test suite.

Vitest example

import { describe, expect, it } from 'vitest';
import { runAgent } from '@fabric-harness/node';

describe('ask agent', () => {
  it('returns a string', async () => {
    const { result } = await runAgent({
      agent: 'ask',
      payload: { question: 'hello' },
      model: 'openai/gpt-5.5',
    });
    expect(typeof result).toBe('string');
  });
});

Doctor

fh doctor --tools           # binary checks
fh doctor --live --model openai/gpt-5.5  # one round-trip

Dev server

fh dev starts the same routes the deployed Node target uses, so you can hit POST /agents/:agent/:id with curl and verify webhook payloads:

fh dev --port 4000
curl -X POST -H 'Content-Type: application/json' \
  -d '{"question":"What is Temporal?"}' \
  http://localhost:4000/agents/ask/run-001

Live integration tests

Live tests are opt-in and skipped by default. Use them when validating real provider credentials and hosted resources:

pnpm --filter @fabric-harness/connectors test   # Daytona / E2B / Modal live suites skip unless enabled
pnpm --filter @fabric-harness/azure test        # Azure OpenAI / Foundry / ARM live suites skip unless enabled
pnpm --filter @fabric-harness/databricks test   # Databricks live suites skip unless enabled

See Live tests for the full environment variable matrix.

Recipes from examples/

The repo's examples/ directory is the canonical reference for how to test each capability:

What to assert in tests

For metadata agents, the most useful assertions are:

  1. Schema shape. Output validates against the declared output schema.
  2. Tool/command scope. No unexpected commands ran.
  3. Artifacts. Expected artifacts were published with the right content type.
  4. Metrics. Token / call counts stay within bounds for a given task.
  5. Idempotence. Re-running the same prompt produces compatible output (when using the mock model or fixed seed).