Execution & Agents
How AXIS executes scenarios, manages agent processes, and isolates workspaces.
Execution Model
When you run axis run, AXIS loads your config, discovers scenarios, and executes
each scenario/agent combination as an independent job. Jobs run in parallel up to the configured
concurrency limit (default: 15).
Each job follows the same lifecycle:
- Setup -Run setup actions defined in the scenario (if any).
- Spawn -Start the agent process in an isolated workspace.
- Capture -Stream and record the full interaction transcript.
- Score -Evaluate the transcript against the judge and interaction signals (unless
--no-scoreis set). - Teardown -Run teardown actions (if any).
- Save -Write the result to the report.
Each scenario has a 15-minute timeout by default (configurable via limits.scenario.time_minutes
or per-scenario limits). If the agent does not finish in time, AXIS sends SIGTERM,
waits briefly, then SIGKILL. Timed-out runs are marked as failed with a timeout error.
Supported Agents
AXIS ships native adapters for Claude Code, Codex, and Gemini, plus every major AI coding agent that speaks the Agent Client Protocol (ACP). See Built-in Agents for the full list with required environment variables.
Custom Agents
You can test any agent by creating a custom agent module. Use the createAgentAdapter()
factory for agents that produce NDJSON or plain text streams, or
createAcpBasedAdapter() for agents that speak the Agent Client Protocol (any CLI
with an --acp mode). Register the module in your config.
// adapters/my-agent.ts
import { createAgentAdapter } from "@netlify/axis";
export default createAgentAdapter<{ stdout: string }>({
name: "my-agent",
resolveCommand: () => ({ command: "my-cli", prefixArgs: [] }),
buildArgs: (input) => [input.prompt],
initialState: () => ({ stdout: "" }),
streamConfig: {
mode: "aggregate",
onChunk: (chunk, ctx) => {
ctx.state.stdout += chunk;
},
},
getResult: (ctx) => ({
result: ctx.state.stdout.trim() || null,
}),
});
Register it in axis.config.json:
{
"adapters": {
"my-agent": "./adapters/my-agent.ts"
},
"agents": ["my-agent"]
} Stream Modes
Custom agents support two modes for processing output:
- Lines mode -For agents that emit NDJSON (one JSON object per line). AXIS
parses each line and passes the parsed object to your
onLinehandler. The nativeclaude-codeandcodexadapters use this mode. ACP-based adapters bypassstreamConfigentirely; the ACP SDK handles framing. - Aggregate mode -For agents that emit plain text or non-JSON output. Raw
chunks are passed to
onChunkand accumulated in state. Use this for agents with custom output formats or simple stdout capture.
The module must export an AgentAdapter as the default export or as a named
adapter export.
Workspace Isolation
Each agent run gets a fresh temporary directory as its workspace. AXIS isolates the following to prevent configuration leakage and cross-run interference:
- HOME directory: Set to a per-job sibling directory of the workspace (not the workspace itself), so each agent's own config files (
.claude/,.codex/, etc.) live alongside the workspace and never appear in the directory the agent is scanning. - Agent-specific dirs:
CLAUDE_CONFIG_DIR,CODEX_HOME,GEMINI_CLI_HOMEare all set to isolated paths inside the per-job HOME. - Environment variables: Only explicitly listed vars and system essentials (
PATH,USER,SHELL,LANG,TERM,TMPDIR) are passed through. AXIS_CONFIG_DIR: Set to the absolute path of the directory containingaxis.config.{json,ts,js,mjs}. Lifecycle scripts and the agent process can use it to reference versioned fixtures or helper scripts. See Lifecycle environment variables.
For NDJSON-style agents, MCP server configuration files are written into each agent's isolated
config directory (under the per-job HOME) before spawn, in the format native to each CLI. ACP-
based adapters pass MCP servers through the ACP session/new call instead. See
MCP Servers in the configuration reference.
Multi-variant Scenarios
A single scenario file can produce multiple jobs by defining variants. Each variant runs as an independent job with its own key, inheriting the base scenario's fields and applying any overrides. This is useful for testing the same task under different tool configurations, prompts, or agent restrictions without duplicating scenario files.
For example, a scenario with two variants and two agents produces four jobs (2 variants × 2
agents). Each variant appears as a separate row in the CLI output and a separate entry in
reports, identified by its @-suffixed key (e.g., create-post@with-mcp).
Report manifest entries include a failed boolean computed from the full run output before
transcripts are stripped from report.json. This preserves the correct status for agents that
return a final result even when their process exits non-zero during cleanup.
See Writing Scenarios → Variants for the full field reference and examples.