Rate limiting
Token-bucket throttling for outbound provider calls, keyed per API key.
When many sessions / agents share a process, they can stampede a single provider API key — bursting past quota and triggering 429s. fabric-harness ships a generic token-bucket primitive that throttles outbound calls per key.
Configure on a provider
import { OpenAICompatibleModelProvider, tokenBucketRateLimiter } from '@fabric-harness/sdk';
const limiter = tokenBucketRateLimiter({
tokensPerSecond: 50, // sustained throughput
burst: 200, // peak capacity
});
const provider = new OpenAICompatibleModelProvider({
baseUrl: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
defaultModel: 'gpt-4o',
rateLimiter: limiter,
});OpenAICompatibleModelProvider and AnthropicModelProvider accept rateLimiter. Each call (generate and stream) calls await limiter.acquire(<bucketKey>) before sending the request.
Bucket keys
The provider hashes the API key into a short opaque string (<providerName>:<sha256-prefix-8>) and uses that as the bucket key. Practical implications:
- Two providers using the same API key share the same bucket — quota stays honest.
- Two providers with different API keys (e.g. OpenAI Tier 4 + Tier 1) get separate buckets automatically.
- Raw keys never appear in logs or metric labels — only the hash prefix.
Reusing the limiter elsewhere
tokenBucketRateLimiter() returns a generic RateLimiter. Use it inside connectors, webhook fan-out, or anywhere you want simple throttling:
const httpLimiter = tokenBucketRateLimiter({ tokensPerSecond: 10, burst: 20 });
async function callExternalApi(url: string) {
await httpLimiter.acquire(`api:${new URL(url).host}`);
return fetch(url);
}Aborts mid-wait when an AbortSignal is passed to acquire({ signal }).
Cross-process throttling (Redis)
@fabric-harness/node ships redisRateLimiter() for fleets that share an API key — multiple containers, autoscaled workers, multi-region. Same RateLimiter interface, atomic Lua-script refill on the Redis side.
import { OpenAICompatibleModelProvider } from '@fabric-harness/sdk';
import { redisRateLimiter } from '@fabric-harness/node';
import Redis from 'ioredis'; // or @upstash/redis — only `eval` is required
const limiter = redisRateLimiter({
client: new Redis(process.env.REDIS_URL!),
tokensPerSecond: 100,
burst: 500,
keyPrefix: 'fh:rl', // optional; defaults to fh:rl
});
const provider = new OpenAICompatibleModelProvider({
baseUrl: 'https://api.openai.com/v1',
apiKey: process.env.OPENAI_API_KEY!,
rateLimiter: limiter,
});Compatible clients: anything that exposes eval(script, keys, args) matching the standard Redis Lua surface. Tested against ioredis and @upstash/redis. fabric-harness has no dependency on either — bring your own.
Atomicity: every acquire runs a single Lua script that reads the bucket, refills based on elapsed time, and either consumes tokens or returns the wait duration. No race window.
Bucket TTL: Redis keys auto-expire 60s after the bucket would be fully refilled, so abandoned bucket keys (e.g. one-off API keys) eventually disappear.
See also
- Cost budgets — for USD ceilings
- Multi-tenancy