Reports & Baselines

Every AXIS run produces a report with full scoring breakdowns and interaction transcripts. Baselines let you snapshot scores and detect regressions over time.

Understanding Reports

Every run automatically saves a report to .axis/reports/. Each report is a directory containing a manifest and per-scenario result files.

.axis/reports/{reportId}/
  report.json                              # Manifest with summary + metadata
  report.html                              # Visual report (after scoring)
  scenarios/{key}/{agent}.json             # Full result with transcript + scores
  scenarios/{key}/{agent}.raw.ndjson       # Raw agent stdout
  scenarios/{key}/{agent}.sparse-index.txt # Compressed transcript for scoring
  scenarios/{key}/{agent}/artifacts/       # Captured files when artifacts is configured

The manifest

report.json contains the run metadata and a summary of every scenario/agent result: the composite AXIS Result, per-dimension scores, token usage, duration, and any error messages. This is the file you read when scripting against AXIS output.

Scenario files

Each {agent}.json file under scenarios/ contains the full result for one scenario/agent combination: the complete interaction transcript, judge evaluations, per-interaction signal scores, and the judge assessment.

Viewing Reports

# List all reports
npx @netlify/axis reports

# View the latest report summary
npx @netlify/axis reports latest

# View a specific scenario detail
npx @netlify/axis reports latest hello-world

# Filter by agent
npx @netlify/axis reports latest --agent claude-code

HTML reports

Open the visual report in your browser for the richest view:

npx @netlify/axis reports latest --html

The HTML report includes:

Composite and per-dimension score breakdowns with visual indicators.
The full interaction transcript with tool calls and results.
Judge evaluations for each judge check and interaction signal.
Score insights identifying the weakest signals for low-scoring dimensions.
A captured-file tree per run when scenarios configure artifacts — preview text/images in a modal, download individual files, or grab everything as a .zip.
Markdown "Setup notes" and "Teardown notes" panels per run when lifecycle scripts write to $AXIS_OUTPUT — useful for capturing workspace state, external probes, or diagnostic context alongside each scored run.

JSON output

For scripting and CI integration, use --json to get machine-readable output:

npx @netlify/axis reports latest --json

Baselines

Baselines snapshot your scores at a point in time. You compare future runs against a baseline to detect regressions -scores that dropped by more than the noise tolerance (1 point).

Setting a baseline

# Save from the latest report
npx @netlify/axis baseline set

# Save with a name (for multiple baselines)
npx @netlify/axis baseline set v1.0

# Save from a specific report (report IDs use YYYY-MM-DD-HHMMSS format)
npx @netlify/axis baseline set --from 2026-04-15-143022

Comparing against a baseline

# Compare during a run (automatic)
npx @netlify/axis run --compare-baseline

# Compare explicitly after a run
npx @netlify/axis baseline compare

# Compare against a named baseline
npx @netlify/axis baseline compare v1.0

The comparison shows deltas for each score. Score changes within the noise tolerance (0 to 1 point) are reported as unchanged. Regressions are highlighted and the command exits with code 1 if any are detected.

When to set baselines

After establishing a good score: Run your scenarios, review the results, and if you are satisfied, save the baseline. This becomes your quality floor.
After intentional changes: If you change your project structure, APIs, or agent configuration and scores change as expected, update the baseline to reflect the new normal.
Named baselines for releases: Use named baselines (baseline set v2.0) to track scores across major versions.

Managing baselines

# List all baselines
npx @netlify/axis baseline list

# View baseline contents
npx @netlify/axis baseline show

# Delete a baseline
npx @netlify/axis baseline delete v1.0

CI Integration

AXIS is designed to run in CI environments. The key patterns:

--json -Machine-readable output to stdout. No live terminal display, no color codes. Suitable for piping to other tools or saving as artifacts.
--compare-baseline -Exits with code 1 if regressions are detected. Use this as a CI gate: the build fails if agent experience degrades.
--concurrency -Control resource usage in constrained CI environments.
API keys via environment -Pass ANTHROPIC_API_KEY, CODEX_API_KEY, or GEMINI_API_KEY as CI secrets. In CI you should always set explicit API keys: claude-code and codex have a local-login fallback for laptop use, but that path is unsuitable for CI because it bills against an individual subscription rather than a service account.

GitHub Actions example

# GitHub Actions example
- name: Run AXIS tests
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: npx @netlify/axis run --json --compare-baseline

Report Storage

Baselines are stored in .axis/baselines/ and designed to be checked into version control so your team shares the same regression thresholds.

Reports and cached remote scenarios should not be committed:

# .gitignore
.axis/reports/
.axis/remotes/