Reports & Baselines

Every AXIS run produces a report with full scoring breakdowns and interaction transcripts. Baselines let you snapshot scores and detect regressions over time.

Understanding Reports

Every run automatically saves a report to .axis/reports/. Each report is a directory containing a manifest and per-scenario result files.

.axis/reports/{reportId}/
  report.json                              # Manifest with summary + metadata
  report.html                              # Visual report (after scoring)
  scenarios/{key}/{agent}.json             # Full result with transcript + scores
  scenarios/{key}/{agent}.raw.ndjson       # Raw agent stdout
  scenarios/{key}/{agent}.sparse-index.txt # Compressed transcript for scoring
  scenarios/{key}/{agent}/artifacts/       # Captured files when artifacts is configured

The manifest

report.json contains the run metadata and a summary of every scenario/agent result: the composite AXIS Result, per-dimension scores, token usage, duration, and any error messages. This is the file you read when scripting against AXIS output.

Scenario files

Each {agent}.json file under scenarios/ contains the full result for one scenario/agent combination: the complete interaction transcript, judge evaluations, per-interaction signal scores, and the judge assessment.

Viewing Reports

# List all reports
npx @netlify/axis reports

# View the latest report summary
npx @netlify/axis reports latest

# View a specific scenario detail
npx @netlify/axis reports latest hello-world

# Filter by agent
npx @netlify/axis reports latest --agent claude-code

HTML reports

Open the visual report in your browser for the richest view:

npx @netlify/axis reports latest --html

The HTML report includes:

  • Composite and per-dimension score breakdowns with visual indicators.
  • The full interaction transcript with tool calls and results.
  • Judge evaluations for each judge check and interaction signal.
  • Score insights identifying the weakest signals for low-scoring dimensions.
  • A captured-file tree per run when scenarios configure artifacts — preview text/images in a modal, download individual files, or grab everything as a .zip.
  • Markdown "Setup notes" and "Teardown notes" panels per run when lifecycle scripts write to $AXIS_OUTPUT — useful for capturing workspace state, external probes, or diagnostic context alongside each scored run.

JSON output

For scripting and CI integration, use --json to get machine-readable output:

npx @netlify/axis reports latest --json

Baselines

Baselines snapshot your scores at a point in time. You compare future runs against a baseline to detect regressions -scores that dropped by more than the noise tolerance (1 point).

Setting a baseline

# Save from the latest report
npx @netlify/axis baseline set

# Save with a name (for multiple baselines)
npx @netlify/axis baseline set v1.0

# Save from a specific report (report IDs use YYYY-MM-DD-HHMMSS format)
npx @netlify/axis baseline set --from 2026-04-15-143022

Comparing against a baseline

# Compare during a run (automatic)
npx @netlify/axis run --compare-baseline

# Compare explicitly after a run
npx @netlify/axis baseline compare

# Compare against a named baseline
npx @netlify/axis baseline compare v1.0

The comparison shows deltas for each score. Score changes within the noise tolerance (0 to 1 point) are reported as unchanged. Regressions are highlighted and the command exits with code 1 if any are detected.

When to set baselines

  • After establishing a good score: Run your scenarios, review the results, and if you are satisfied, save the baseline. This becomes your quality floor.
  • After intentional changes: If you change your project structure, APIs, or agent configuration and scores change as expected, update the baseline to reflect the new normal.
  • Named baselines for releases: Use named baselines (baseline set v2.0) to track scores across major versions.

Managing baselines

# List all baselines
npx @netlify/axis baseline list

# View baseline contents
npx @netlify/axis baseline show

# Delete a baseline
npx @netlify/axis baseline delete v1.0

CI Integration

AXIS is designed to run in CI environments. The key patterns:

  • --json -Machine-readable output to stdout. No live terminal display, no color codes. Suitable for piping to other tools or saving as artifacts.
  • --compare-baseline -Exits with code 1 if regressions are detected. Use this as a CI gate: the build fails if agent experience degrades.
  • --concurrency -Control resource usage in constrained CI environments.
  • API keys via environment -Pass ANTHROPIC_API_KEY, CODEX_API_KEY, or GEMINI_API_KEY as CI secrets. In CI you should always set explicit API keys: claude-code and codex have a local-login fallback for laptop use, but that path is unsuitable for CI because it bills against an individual subscription rather than a service account.

GitHub Actions example

# GitHub Actions example
- name: Run AXIS tests
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: npx @netlify/axis run --json --compare-baseline

Report Storage

Baselines are stored in .axis/baselines/ and designed to be checked into version control so your team shares the same regression thresholds.

Reports and cached remote scenarios should not be committed:

# .gitignore
.axis/reports/
.axis/remotes/
AXIS is OSS maintained by Netlify and the open source contributors.