Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarking

clawzero includes a reproducible Docker-based benchmark environment for performance comparison against Claude Code and OpenClaw.

Quick Start

# Run all tools × all scenarios
docker compose -f bench/docker-compose.yml run bench

# Measure clawzero startup time only
docker compose -f bench/docker-compose.yml run bench --tools clawzero --scenarios startup

# Specify iteration count
docker compose -f bench/docker-compose.yml run bench --iterations 10

Prerequisites

  • Docker / Docker Compose
  • ANTHROPIC_API_KEY environment variable (required for API scenarios)
  • OPENAI_API_KEY environment variable (optional, for OpenClaw)

Metrics

MetricMeasurement Method
Startup time (cold start)Wall-clock time of --help execution via hyperfine
TTFT (Time to First Token)Time until first stdout byte via custom wrapper
E2E completion timeWall-clock time of prompt execution via hyperfine
Memory usage (peak RSS)Maximum resident set size via /usr/bin/time -v
Token throughputOutput characters / E2E time

Scenarios

ScenarioDescriptionAPI Call
startup--help execution timeNo
simpleResponse to "What is 1+1?"Yes
tool_useFile read + line countYes

File Structure

bench/
├── Dockerfile              # Multi-stage build
├── docker-compose.yml      # Environment variables and volume mounts
├── run.sh                  # Main benchmark runner
├── adapters/
│   ├── clawzero.sh         # clawzero invocation adapter
│   ├── claude-code.sh      # Claude Code invocation adapter
│   └── openclaw.sh         # OpenClaw invocation adapter
├── measure_ttft.sh         # TTFT measurement helper
├── fixtures/
│   └── bench_input.txt     # Test file for tool_use scenario
└── results/                # Output directory (.gitignore)

run.sh Options

--tools <t1,t2,...>       Tools to benchmark (default: clawzero,claude-code,openclaw)
--scenarios <s1,s2,...>   Scenarios to run (default: startup,simple,tool_use)
--iterations <N>          Iteration count (default: $BENCH_ITERATIONS or 5)
--results-dir <path>      Output directory (default: bench/results)

Environment Variables

VariableDescriptionDefault
ANTHROPIC_API_KEYAnthropic API key(required)
OPENAI_API_KEYOpenAI API key(optional)
BENCH_ITERATIONSIteration count5
BENCH_MODELModel used by clawzeroanthropic/claude-sonnet-4-5-20250929

Results

Results are saved to bench/results/<timestamp>/:

  • results.json — All metrics in JSON format
  • <tool>_<scenario>_hyperfine.json — hyperfine raw data
  • <tool>_<scenario>_time.txt/usr/bin/time output
  • <tool>_<scenario>_ttft.csv — TTFT CSV data

A summary table is printed to the console when execution completes.

Adding New Adapters

To add a new tool, create bench/adapters/<name>.sh and define the following functions:

TOOL_NAME="my-tool"

cmd_startup() {
    my-tool --help
}

cmd_simple() {
    my-tool "What is 1+1?"
}

cmd_tool_use() {
    my-tool "Read /tmp/bench_input.txt and count the lines"
}

Specify --tools my-tool to automatically load the adapter.