18. Benchmarking (Docker)¶

Purpose¶

Define a reproducible, deterministic benchmark workflow for CodeClone in Docker.

Benchmark output (benchmark_schema_version=1.0) contains:

tool metadata (name, version, python_tag)
benchmark config (target, runs, warmups)
execution environment (platform, cpu limits/affinity, cgroup limits)
scenario results:
- cold_full (cold cache each run)
- warm_full (shared warm cache)
- warm_clones_only (shared warm cache with --skip-metrics)
latency stats per scenario (min, max, mean, median, p95, stdev)
deterministic digest check (integrity.digest.value must be stable within scenario)
cross-scenario comparisons (speedup ratios)

Benchmark must run in containerized, isolated environment.
CPU/memory limits are pinned at container run time (--cpuset-cpus, --cpus, --memory).
Runtime environment is normalized: PYTHONHASHSEED=0, TZ=UTC, LC_ALL/LANG=C.UTF-8.
Each measured run must exit successfully (exit=0); any failure aborts the benchmark.
Determinism guard: if scenario digest diverges across measured runs, benchmark fails.

Cold scenario uses a fixed cache path and removes cache file before each run (cold cache with stable canonical metadata path).
Warm scenarios seed one shared cache file before warmups/measured runs.
Benchmark JSON write is atomic (.tmp + replace).
Benchmark scenario ordering is stable and fixed.

Condition	Behavior
Docker unavailable	Host wrapper fails fast
Non-zero CLI exit in any run	Runner aborts with command stdout/stderr tail
Missing/invalid report integrity digest	Runner aborts as invalid benchmark sample
Digest mismatch in one scenario	Runner aborts as non-deterministic

Per-run determinism uses canonical report digest: report.integrity.digest.value.
Digest intentionally ignores runtime timestamp (meta.runtime) in canonical payload, so deterministic check remains valid.
Output JSON is serialized with stable formatting (indent=2) and written atomically.

Refs:

./benchmarks/run_docker_benchmark.sh

Useful overrides:

CPUSET=0 CPUS=1.0 MEMORY=2g RUNS=16 WARMUPS=4 \
  ./benchmarks/run_docker_benchmark.sh

Permissions note:

The host wrapper runs the container as host uid:gid by default (--user "$(id -u):$(id -g)") so benchmark artifact writes to bind-mounted output paths are stable in CI.
Override explicitly if needed: CONTAINER_USER=10001:10001.

Workflow: .github/workflows/benchmark.yml
Triggers:
- manual (workflow_dispatch)
- pull requests targeting feat/2.0.0
Job behavior:
- runs Docker benchmark with pinned runner limits
- uploads .cache/benchmarks/codeclone-benchmark.json as artifact
- emits scenario table and ratio table into GITHUB_STEP_SUMMARY
- prints ratios in job logs (important for quick trend checks)

Cross-host absolute timings are not comparable by contract.
Throughput numbers can vary with host kernel, thermal state, and background load.