06. Baseline¶
Purpose¶
Specify baseline schema v2, trust/compatibility checks, integrity hashing, and runtime behavior.
Public surface¶
- Baseline object lifecycle:
codeclone/baseline.py:Baseline - Baseline statuses:
codeclone/baseline.py:BaselineStatus - Baseline status coercion:
codeclone/baseline.py:coerce_baseline_status - CLI integration:
codeclone/cli.py:_main_impl
Data model¶
Canonical baseline shape:
- Required top-level keys:
meta,clones - Optional top-level key:
metrics(unified baseline flow) metarequired keys:generator,schema_version,fingerprint_version,python_tag,created_at,payload_sha256clonesrequired keys:functions,blocksfunctionsandblocksare sorted/uniquelist[str]
Refs:
codeclone/baseline.py:_TOP_LEVEL_REQUIRED_KEYScodeclone/baseline.py:_TOP_LEVEL_OPTIONAL_KEYScodeclone/baseline.py:_META_REQUIRED_KEYScodeclone/baseline.py:_CLONES_REQUIRED_KEYScodeclone/baseline.py:_require_sorted_unique_ids
Contracts¶
Compatibility gates (verify_compatibility):
generator == "codeclone"schema_versionmajor/minor must be supported by runtimefingerprint_version == BASELINE_FINGERPRINT_VERSIONpython_tag == current_python_tag()- integrity verified via
payload_sha256
Embedded metrics contract:
- Top-level
metricsis allowed only for baseline schema>= 2.0. - Clone baseline save preserves existing embedded
metricspayload andmeta.metrics_payload_sha256.
Integrity payload includes only:
clones.functionsclones.blocksmeta.fingerprint_versionmeta.python_tag
Integrity payload excludes:
meta.schema_versionmeta.generator.*meta.created_at
Refs:
codeclone/baseline.py:Baseline.verify_compatibilitycodeclone/baseline.py:_compute_payload_sha256codeclone/baseline.py:_preserve_embedded_metrics
Invariants (MUST)¶
- Legacy top-level baselines (
functions/blocksat root) are untrusted and require regeneration. - Baseline writes are atomic (
*.tmp+os.replace, same filesystem). - Baseline diff is set-based and deterministic.
Refs:
codeclone/baseline.py:_is_legacy_baseline_payloadcodeclone/baseline.py:_atomic_write_jsoncodeclone/baseline.py:Baseline.diff
Failure modes¶
| Condition | Status |
|---|---|
| File missing | missing |
| Too large | too_large |
| JSON decode failure | invalid_json |
| Top-level shape/type mismatch | invalid_type / missing_fields |
| Schema mismatch | mismatch_schema_version |
| Fingerprint mismatch | mismatch_fingerprint_version |
| Python tag mismatch | mismatch_python_version |
| Generator mismatch | generator_mismatch |
| Hash missing/invalid | integrity_missing |
| Hash mismatch | integrity_failed |
CLI behavior:
- Normal mode: untrusted baseline is ignored, diff runs against empty baseline.
- Gating mode (
--ci/--fail-on-new): untrusted baseline is contract error (exit 2).
Refs:
codeclone/baseline.py:BaselineStatuscodeclone/cli.py:_main_impl
Determinism / canonicalization¶
- Clone IDs are serialized sorted.
- Hash serialization uses canonical JSON (
sort_keys=True, compact separators). payload_sha256useshmac.compare_digestduring verification.
Refs:
codeclone/baseline.py:_baseline_payloadcodeclone/baseline.py:_compute_payload_sha256codeclone/baseline.py:Baseline.verify_integrity
Locked by tests¶
tests/test_baseline.py::test_baseline_roundtrip_v1tests/test_baseline.py::test_baseline_payload_fields_contract_invarianttests/test_baseline.py::test_baseline_payload_sha256_independent_of_schema_versiontests/test_baseline.py::test_baseline_verify_python_tag_mismatchtests/test_cli_inprocess.py::test_cli_reports_include_audit_metadata_schema_mismatch
Non-guarantees¶
- Baseline generator version (
meta.generator.version) is informational and not a compatibility gate. - Baseline file indentation/style is not part of compatibility contract.