FabricFabricHarness
Operating

Rate limiting

Token-bucket throttling for outbound provider calls, keyed per API key.

When many sessions / agents share a process, they can stampede a single provider API key — bursting past quota and triggering 429s. fabric-harness ships a generic token-bucket primitive that throttles outbound calls per key.

Configure on a provider

import { OpenAICompatibleModelProvider, tokenBucketRateLimiter } from '@fabric-harness/sdk';

const limiter = tokenBucketRateLimiter({
  tokensPerSecond: 50,    // sustained throughput
  burst: 200,             // peak capacity
});

const provider = new OpenAICompatibleModelProvider({
  baseUrl: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY!,
  defaultModel: 'gpt-4o',
  rateLimiter: limiter,
});

OpenAICompatibleModelProvider and AnthropicModelProvider accept rateLimiter. Each call (generate and stream) calls await limiter.acquire(<bucketKey>) before sending the request.

Bucket keys

The provider hashes the API key into a short opaque string (<providerName>:<sha256-prefix-8>) and uses that as the bucket key. Practical implications:

  • Two providers using the same API key share the same bucket — quota stays honest.
  • Two providers with different API keys (e.g. OpenAI Tier 4 + Tier 1) get separate buckets automatically.
  • Raw keys never appear in logs or metric labels — only the hash prefix.

Reusing the limiter elsewhere

tokenBucketRateLimiter() returns a generic RateLimiter. Use it inside connectors, webhook fan-out, or anywhere you want simple throttling:

const httpLimiter = tokenBucketRateLimiter({ tokensPerSecond: 10, burst: 20 });

async function callExternalApi(url: string) {
  await httpLimiter.acquire(`api:${new URL(url).host}`);
  return fetch(url);
}

Aborts mid-wait when an AbortSignal is passed to acquire({ signal }).

Cross-process throttling (Redis)

@fabric-harness/node ships redisRateLimiter() for fleets that share an API key — multiple containers, autoscaled workers, multi-region. Same RateLimiter interface, atomic Lua-script refill on the Redis side.

import { OpenAICompatibleModelProvider } from '@fabric-harness/sdk';
import { redisRateLimiter } from '@fabric-harness/node';
import Redis from 'ioredis';   // or @upstash/redis — only `eval` is required

const limiter = redisRateLimiter({
  client: new Redis(process.env.REDIS_URL!),
  tokensPerSecond: 100,
  burst: 500,
  keyPrefix: 'fh:rl',         // optional; defaults to fh:rl
});

const provider = new OpenAICompatibleModelProvider({
  baseUrl: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY!,
  rateLimiter: limiter,
});

Compatible clients: anything that exposes eval(script, keys, args) matching the standard Redis Lua surface. Tested against ioredis and @upstash/redis. fabric-harness has no dependency on either — bring your own.

Atomicity: every acquire runs a single Lua script that reads the bucket, refills based on elapsed time, and either consumes tokens or returns the wait duration. No race window.

Bucket TTL: Redis keys auto-expire 60s after the bucket would be fully refilled, so abandoned bucket keys (e.g. one-off API keys) eventually disappear.

See also