Databricks

SQL Warehouse, Jobs, notebooks, Unity Catalog, MLflow, and workspace files for Fabric Harness agents.

Fabric Harness ships Databricks integration helpers in @fabric-harness/databricks. They are designed for data, analytics, ML, and governance agents that need to query warehouse data, trigger jobs, inspect Unity Catalog, run notebooks, or write MLflow telemetry without putting Databricks credentials in model context.

Status: integration helper package with mocked unit tests and live-gated tests. It is ready for controlled pilots when you provide a Databricks host/token and opt into live tests. A SQL Warehouse-backed SandboxEnv (databricksSqlSandbox) ships in v0.6; cluster/notebook-backed SandboxEnv modes remain future work.

Install

npm install @fabric-harness/databricks @fabric-harness/sdk

Client

import { createDatabricksClient } from '@fabric-harness/databricks';

const databricks = createDatabricksClient({
  host: process.env.DATABRICKS_HOST!,
  token: process.env.DATABRICKS_TOKEN!,
});

Use a token provider function when running with managed credentials:

const databricks = createDatabricksClient({
  host: process.env.DATABRICKS_HOST!,
  token: async () => getDatabricksTokenFromVault(),
});

SQL Warehouse tool

import { databricksSqlTool } from '@fabric-harness/databricks';

const sql = databricksSqlTool(databricks, {
  warehouseId: process.env.DATABRICKS_WAREHOUSE_ID!,
});

const fabric = await init({ tools: [sql] });
const session = await fabric.session();
await session.prompt('Query revenue by month. Use only SELECT statements.');

The tool calls /api/2.0/sql/statements. For long-running statements, use waitForDatabricksStatement() in trusted code.

import { waitForDatabricksStatement } from '@fabric-harness/databricks';

const submitted = await sql.execute?.({ statement: 'select 1' }) as { statement_id?: string };
if (submitted?.statement_id) {
  await waitForDatabricksStatement(databricks, submitted.statement_id);
}

SQL Warehouse sandbox

Run an entire agent against a SQL Warehouse, with exec() mapped to SQL statement execution:

import { init } from '@fabric-harness/sdk';
import { databricksSqlSandbox } from '@fabric-harness/databricks/sql-sandbox';

const fabric = await init({
  sandbox: databricksSqlSandbox({
    host: process.env.DATABRICKS_HOST!,
    token: process.env.DATABRICKS_TOKEN!,
    warehouseId: process.env.DATABRICKS_WAREHOUSE_ID!,
    catalog: 'main',     // optional Unity Catalog
    schema: 'analytics', // optional schema
    resultFormat: 'jsonl', // 'jsonl' (default) or 'csv'
  }),
});

const session = await fabric.session();
const result = await session.shell('SELECT customer_id, SUM(amount) FROM main.analytics.orders GROUP BY 1 ORDER BY 2 DESC LIMIT 10');
console.log(result.stdout); // → newline-delimited JSON rows

File operations are stored in an in-memory map for the session — useful for staging small CSVs/Markdown summaries the agent emits. For real data files, mount with databricksVolumeSource (below) instead.

Unity Catalog volumes

Mount a UC volume as files inside any sandbox:

import { databricksVolumeSource } from '@fabric-harness/connectors/databricks-volume';

await session.mount('/mnt/landing', databricksVolumeSource({
  host: process.env.DATABRICKS_HOST!,
  token: process.env.DATABRICKS_TOKEN!,
  volumePath: '/Volumes/main/landing/raw',
}));

await session.shell('grep -c error /mnt/landing/2026/01/jan.log');

Jobs and notebooks

Trigger existing Jobs:

import { databricksRunJobTool } from '@fabric-harness/databricks';

const tools = [databricksRunJobTool(databricks)];

Submit a one-off notebook run:

import { databricksNotebookTool } from '@fabric-harness/databricks';

const notebook = databricksNotebookTool(databricks, {
  existingClusterId: process.env.DATABRICKS_CLUSTER_ID,
});

Wait for runs in trusted code:

import { waitForDatabricksRun } from '@fabric-harness/databricks';

const run = await databricksRunJobTool(databricks).execute?.({ jobId: 123 }) as { run_id?: number };
if (run?.run_id) await waitForDatabricksRun(databricks, run.run_id);

Unity Catalog discovery

import { unityCatalogTablesTool } from '@fabric-harness/databricks';

const uc = unityCatalogTablesTool(databricks);
const fabric = await init({ tools: [uc] });

This gives agents read-only discovery over catalog/schema table metadata. Combine it with policy prompts that require explicit table names and approved query shapes before invoking SQL.

MLflow logging

import {
  databricksMlflowLogMetricTool,
  databricksMlflowLogParamTool,
} from '@fabric-harness/databricks';

const tools = [
  databricksMlflowLogMetricTool(databricks),
  databricksMlflowLogParamTool(databricks),
];

Use these for evaluation or data-profiler agents that should write run metrics back to MLflow.

Workspace files as a filesystem source

Mount exported workspace notebooks/files into the Fabric sandbox so agents can read, grep, and glob them like local files:

import {
  databricksWorkspaceSource,
} from '@fabric-harness/databricks';
import { withFilesystemSources } from '@fabric-harness/sdk';

const sandbox = withFilesystemSources('virtual', [{
  mountAt: '/workspace/databricks',
  source: databricksWorkspaceSource(databricks, '/Repos/acme/analytics'),
}]);

const fabric = await init({ sandbox });

The source uses /api/2.0/workspace/list and /api/2.0/workspace/export.

Security model

Keep DATABRICKS_TOKEN in env, Key Vault, or your runtime secret manager.
Prefer least-privilege service principals.
Use Unity Catalog permissions as the primary data governance layer.
Treat SQL and Jobs tools as execute effects and gate risky writes with Fabric policy/approvals.
Do not pass tokens, PATs, warehouse IDs, or cluster IDs in prompts.

Live tests

Live tests are skipped unless explicitly enabled:

FABRIC_DATABRICKS_TEST=1 \
DATABRICKS_HOST=https://dbc-...cloud.databricks.com \
DATABRICKS_TOKEN=... \
pnpm --filter @fabric-harness/databricks test

Optional resources unlock deeper checks:

DATABRICKS_WAREHOUSE_ID=...
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default
DATABRICKS_JOB_ID=...
DATABRICKS_NOTEBOOK_PATH=/Repos/acme/smoke
DATABRICKS_CLUSTER_ID=...
DATABRICKS_MLFLOW_RUN_ID=...
DATABRICKS_WORKSPACE_ROOT=/Repos/acme

What is not implemented yet

A Databricks cluster/notebook-backed SandboxEnv (SQL Warehouse-backed databricksSqlSandbox ships — see above).
First-class deployment target that packages and deploys Fabric agents as Databricks Jobs.
Unity Catalog lineage writeback helpers.

See docs/ROADMAP.md for status.

Databricks

On this page