Databricks
First-party Databricks integration — Mosaic AI model serving, Unity Catalog governance, RAG, Lakebase, deploy targets, and cost reconciliation.
@fabric-harness/databricks is a first-party integration for building governed agents on
Databricks. One databricks() call wires the agent's brain (Mosaic AI Model Serving), tools
(governed SQL, Unity Catalog, AI Functions, Genie, Lakeflow, Vector Search, Feature Serving), state
(Lakebase), governance (Unity Catalog), and cost controls — under a single principal. Everything is
layered on top of Unity Catalog; it never re-implements UC permissions.
Status: controlled-pilot ready with mocked unit tests and live-gated tests. The REST/serving API shapes are written against current Databricks contracts; smoke-test against a live workspace before GA.
The bundle
import { databricks } from '@fabric-harness/databricks';
const dbx = databricks({
host: process.env.DATABRICKS_HOST!,
// UC enforces this principal's grants on every call.
principal: {
kind: 'service-principal',
host: process.env.DATABRICKS_HOST!,
clientId: process.env.DATABRICKS_CLIENT_ID!,
clientSecret: process.env.DATABRICKS_CLIENT_SECRET!,
},
model: 'databricks-meta-llama-3-3-70b-instruct',
warehouseId: process.env.DATABRICKS_WAREHOUSE_ID!,
vectorSearch: { index: 'main.kb.docs_index', textColumn: 'chunk', idColumn: 'id' }, // RAG
aiFunctions: true, // ai_query()
genie: { spaceId: 'space-123' }, // NL → SQL analytics
lakeflow: true, // pipeline tools
consumption: true, // System-Tables cost reporting
featureServing: { endpoint: 'user-features' },
governance: { stewardAudience: 'data-steward', onLineage: (r) => console.log('[lineage]', r) },
});
// dbx.modelProvider, dbx.tools, dbx.policy, dbx.retriever, dbx.consumption, dbx.store, dbx.identitydatabricks/<endpoint> model refs also resolve through the SDK, so
FABRIC_MODEL=databricks/<endpoint> works directly.
Model serving
Databricks serving endpoints expose an OpenAI-compatible API, so
databricksFoundationModelProvider({ host, token }) is a thin wrapper that inherits streaming,
tool-calls, and reasoning effort. The same token threads through every REST and data call.
Identity & Unity Catalog governance
databricksIdentity() produces a rotating token from a PAT, an OAuth service principal
(cached + refreshed), or on-behalf-of a specific end user. UC enforces that principal's
table/row/column grants natively — the agent physically cannot read what it lacks SELECT on. On
top of UC, Fabric adds:
- Lineage/audit —
withGovernance()stamps every tool call (principal, service, catalog/schema) to anonLineagesink (secrets redacted). - Approval routing —
databricksGovernancePolicy()routes sensitive (write/execute) tools to a steward audience viaCapabilityPolicy.approvalRules. - Egress allowlist — outbound network pinned to the workspace host.
With channel actor propagation, an agent can act on behalf of the human who triggered it, so UC
enforces that user's grants — per-user data boundaries with no per-user policy code.
Tools
| Tool | What |
|---|---|
databricks_sql | Run SQL on a warehouse (governed). |
databricks_unity_catalog_tables / databricks_table_info | Discover + describe UC tables. |
search (Vector Search) | RAG retrieval over a Mosaic AI Vector Search index. |
databricks_ai_query | In-warehouse ai_query() model inference (parameterized, injection-safe). |
databricks_genie_ask | NL → SQL analytics over an AI/BI Genie space. |
databricks_pipeline_* | List / status (read) + start / stop (execute) Lakeflow pipelines. |
databricks_feature_lookup | Low-latency feature lookup from a Feature Serving endpoint. |
databricks_consumption | Real DBUs + list cost from system.billing System Tables. |
Job, notebook, and MLflow tools (databricksRunJobTool, databricksNotebookTool,
databricksMlflowLogMetricTool) and the SQL-warehouse SandboxEnv (databricksSqlSandbox,
@fabric-harness/databricks/sql-sandbox) remain available as building blocks.
State — Lakebase
Lakebase is managed Postgres. lakebaseClient() builds a Postgres client whose password is the
rotating OAuth token (evaluated per connection), so it drops into PostgresSessionStore — agent
sessions and the dispatch journal live in the lakehouse under the same principal.
Deploy targets
Build a deployable artifact with fabric-harness build --target <name>:
databricks-app— runs the agent in-workspace as a Databricks App (the app's service principal is the acting UC identity). Emitsapp.yaml(bridgesDATABRICKS_APP_PORT) + deploy docs.databricks-serving— wrapper-only: packages an MLflowChatAgentproxy to an agent deployed elsewhere, registers it to Unity Catalog, and creates a serving endpoint that Agent Bricks and the Playground consume as a model/tool.
Agents are runtime-agnostic: the same agent can also run on Temporal/Node/Cloudflare and simply consume Databricks as a backend.
Cost reconciliation
databricksTenantCostLimit() enforces a perScope budget against real Databricks spend from
System Tables (estimates still guard perCall/perSession). See
Cost attribution.
See also
- Examples:
with-databricks(analytics copilot),with-databricks-rag(support agent),with-databricks-dataeng(pipelines),with-databricks-cost-attribution. - Model providers · Channels · Cost attribution