Execution & Agents

How AXIS executes scenarios, manages agent processes, and isolates workspaces.

Execution Model

When you run axis run, AXIS loads your config, discovers scenarios, and executes each scenario/agent combination as an independent job. Jobs run in parallel up to the configured concurrency limit (default: 15).

Each job follows the same lifecycle:

Setup -Run setup actions defined in the scenario (if any).
Spawn -Start the agent process in an isolated workspace.
Capture -Stream and record the full interaction transcript.
Score -Evaluate the transcript against the judge and interaction signals (unless --no-score is set).
Teardown -Run teardown actions (if any).
Save -Write the result to the report.

Each scenario has a 15-minute timeout by default (configurable via limits.scenario.time_minutes or per-scenario limits). If the agent does not finish in time, AXIS sends SIGTERM, waits briefly, then SIGKILL. Timed-out runs are marked as failed with a timeout error.

Supported Agents

AXIS ships native adapters for Claude Code, Codex, and Gemini, plus every major AI coding agent that speaks the Agent Client Protocol (ACP). See Built-in Agents for the full list with required environment variables.

Custom Agents

You can test any agent by creating a custom agent module. Use the createAgentAdapter() factory for agents that produce NDJSON or plain text streams, or createAcpBasedAdapter() for agents that speak the Agent Client Protocol (any CLI with an --acp mode). Register the module in your config.

// adapters/my-agent.ts
import { createAgentAdapter } from "@netlify/axis";

export default createAgentAdapter<{ stdout: string }>({
  name: "my-agent",
  resolveCommand: () => ({ command: "my-cli", prefixArgs: [] }),
  buildArgs: (input) => [input.prompt],
  initialState: () => ({ stdout: "" }),
  streamConfig: {
    mode: "aggregate",
    onChunk: (chunk, ctx) => {
      ctx.state.stdout += chunk;
    },
  },
  getResult: (ctx) => ({
    result: ctx.state.stdout.trim() || null,
  }),
});

{
  "adapters": {
    "my-agent": "./adapters/my-agent.ts"
  },
  "agents": ["my-agent"]
}

Stream Modes

Custom agents support two modes for processing output:

Lines mode -For agents that emit NDJSON (one JSON object per line). AXIS parses each line and passes the parsed object to your onLine handler. The native claude-code and codex adapters use this mode. ACP-based adapters bypass streamConfig entirely; the ACP SDK handles framing.
Aggregate mode -For agents that emit plain text or non-JSON output. Raw chunks are passed to onChunk and accumulated in state. Use this for agents with custom output formats or simple stdout capture.

The module must export an AgentAdapter as the default export or as a named adapter export.

Workspace Isolation

Each agent run gets a fresh temporary directory as its workspace. AXIS isolates the following to prevent configuration leakage and cross-run interference:

HOME directory: Set to a per-job sibling directory of the workspace (not the workspace itself), so each agent's own config files (.claude/, .codex/, etc.) live alongside the workspace and never appear in the directory the agent is scanning.
Agent-specific dirs: CLAUDE_CONFIG_DIR, CODEX_HOME, GEMINI_CLI_HOME are all set to isolated paths inside the per-job HOME.
Environment variables: Only explicitly listed vars and system essentials (PATH, USER, SHELL, LANG, TERM, TMPDIR) are passed through.
AXIS_CONFIG_DIR: Set to the absolute path of the directory containing axis.config.{json,ts,js,mjs}. Lifecycle scripts and the agent process can use it to reference versioned fixtures or helper scripts. See Lifecycle environment variables.

For NDJSON-style agents, MCP server configuration files are written into each agent's isolated config directory (under the per-job HOME) before spawn, in the format native to each CLI. ACP- based adapters pass MCP servers through the ACP session/new call instead. See MCP Servers in the configuration reference.

Multi-variant Scenarios

A single scenario file can produce multiple jobs by defining variants. Each variant runs as an independent job with its own key, inheriting the base scenario's fields and applying any overrides. This is useful for testing the same task under different tool configurations, prompts, or agent restrictions without duplicating scenario files.

For example, a scenario with two variants and two agents produces four jobs (2 variants × 2 agents). Each variant appears as a separate row in the CLI output and a separate entry in reports, identified by its @-suffixed key (e.g., create-post@with-mcp).

Report manifest entries include a failed boolean computed from the full run output before transcripts are stripped from report.json. This preserves the correct status for agents that return a final result even when their process exits non-zero during cleanup.

See Writing Scenarios → Variants for the full field reference and examples.