14. Compatibility and Versioning¶
Purpose¶
Define when to bump baseline/cache/report/fingerprint versions and how runtime compatibility is enforced.
Public surface¶
- Version constants:
codeclone/contracts.py - Baseline compatibility checks:
codeclone/baseline.py:Baseline.verify_compatibility - Metrics baseline compatibility checks:
codeclone/metrics_baseline.py:MetricsBaseline.verify_compatibility - Cache compatibility checks:
codeclone/cache.py:Cache.load - Report schema assignment:
codeclone/report/json_contract.py:build_report_document - MCP public surface:
codeclone/mcp_server.py,codeclone/mcp_service.py
Data model¶
Current contract versions:
BASELINE_SCHEMA_VERSION = "2.1"BASELINE_FINGERPRINT_VERSION = "1"CACHE_VERSION = "2.5"REPORT_SCHEMA_VERSION = "2.8"METRICS_BASELINE_SCHEMA_VERSION = "1.2"(used only when metrics are stored in a dedicated metrics-baseline file instead of the default unified baseline)
Refs:
codeclone/contracts.py
Contracts¶
Version bump rules:
- Bump baseline schema only for baseline JSON layout/type changes.
- Bump fingerprint version when clone key semantics change.
- Bump cache schema for cache wire-format/validation changes and for cached-analysis semantic changes that would otherwise leave stale cache entries looking compatible to runtime validation.
- Bump report schema for canonical report document contract changes
(
report_schema_version, consumed by JSON/TXT/Markdown/SARIF and HTML provenance/view). - Bump metrics-baseline schema only for dedicated metrics-baseline payload changes.
- This schema does not imply that metrics normally live in a separate file: the default runtime path is still the unified baseline file, and the standalone metrics-baseline schema applies only when users opt into a different metrics-baseline path.
- MCP does not currently define a separate schema/version constant; tool names, resource shapes, and documented request/response semantics are therefore package-versioned public surface and must be documented/tested when changed.
- Slimming or splitting MCP-only projections (for example, summary payloads or
metricsvsmetrics_detail) does not changereport_schema_versionas long as the canonical report document and finding identities remain unchanged. - The same rule applies to finding-level MCP projection changes such as
short MCP ids, slim summary locations, or omitting
priority_factorsoutsidedetail_level="full". - Additive MCP-only convenience fields/projections such as
cache.freshness, production-first triage,health_scope,focus, ornew_by_source_kindalso do not changereport_schema_versionwhen they are derived from unchanged canonical report and summary data. - The same rule applies to bounded MCP semantic guidance such as
help(topic=...): package-versioned wording and routing may evolve, but they do not changereport_schema_versionas long as canonical report semantics and finding identities remain unchanged. - Canonical report changes such as
meta.analysis_thresholds.design_findingsor threshold-aware design finding materialization do changereport_schema_versionbecause they alter canonical report semantics and integrity payload. - The same is true for additive canonical metrics families such as
metrics.families.overloaded_modules,coverage_adoption,api_surface, orcoverage_join: even when the layer is report-only or current-run only, it still changes canonical report schema and integrity payload, so it requires a report-schema bump. - The same rule applies to new canonical suppressed-finding buckets such as
findings.groups.clones.suppressed.*: even though they are non-active review facts, they still change canonical report shape and integrity payload. - CodeClone does not currently define a separate health-model version constant. Health-score semantics are package-versioned and must be documented in the Health Score chapter and release notes when they change.
Baseline compatibility rules:
- Runtime accepts baseline schema majors
1and2with supported minors. - Runtime writes current schema (
2.1) on new/updated baseline saves. - Embedded top-level
metricsis valid only for baseline schema>= 2.0. - Unified clone baselines may also embed top-level
api_surfacewhen metrics baseline data is stored in the same file. - Embedded and standalone
api_surfacesnapshots now use compact symbol wire layout (local_namerelative tomodule,filepathrelative to the baseline directory when possible) while runtime reconstructs full canonical qualnames and runtime filepaths before comparison. This is a schema change for baseline2.1/ metrics-baseline1.2, not a silent serialization detail. - Capability-sensitive metrics gates (for example adoption regression or API break gating) must check for the required embedded data, not only the clone baseline schema version.
Metrics-baseline compatibility rules:
- Runtime writes standalone metrics-baseline schema
1.2. - Runtime accepts standalone metrics-baseline
1.1and1.2. - When metrics are embedded into the unified clone baseline, the embedded
metrics section follows the clone baseline schema compatibility window
instead (
2.0and2.1in the current runtime).
Baseline regeneration rules:
- Required when
fingerprint_versionchanges. - Required when
python_tagchanges. - Not required for package patch/minor updates if compatibility gates still pass.
Health model evolution¶
Health Score is stable within a given scoring model, but the scoring model may evolve across releases.
New signal families may first appear as report-only or experimental layers. After validation and contract hardening, selected layers may later be promoted into scoring.
Future CodeClone releases may expand the Health Score formula with additional validated signal families. As a result, a repository's score may decrease after upgrade even if the code itself did not become worse. In such cases, the change reflects an evolved scoring model rather than a retroactive decline in code quality.
Short operational reminder:
A lower score after upgrade may reflect a broader health model, not only worse code.
Contract consequence:
- health-model expansion does not necessarily require a baseline/cache/report schema bump;
- but it does require explicit documentation and release-note coverage, because it changes user-visible scoring semantics.
Invariants (MUST)¶
- Contract changes must include code updates and changelog/docs updates.
- Schema mismatches must map to explicit statuses.
- Legacy baseline payloads (<=1.3 layout) remain untrusted and require regeneration.
Refs:
codeclone/baseline.py:BaselineStatuscodeclone/baseline.py:_is_legacy_baseline_payload
Failure modes¶
| Change type | User impact |
|---|---|
| Baseline schema bump | older unsupported baseline schemas become untrusted until regenerated |
| Fingerprint bump | clone IDs change; baseline regeneration required |
| Cache schema bump | old caches are ignored and rebuilt automatically |
| Report schema bump | downstream report consumers must update |
| Metrics-baseline schema bump | dedicated metrics-baseline files must be regenerated |
Determinism / canonicalization¶
- Version constants are explicit and enforced in code.
- Compatibility decisions are runtime checks, not doc-only expectations.
Refs:
codeclone/contracts.pycodeclone/baseline.py:Baseline.verify_compatibilitycodeclone/metrics_baseline.py:MetricsBaseline.verify_compatibility
Locked by tests¶
tests/test_baseline.py::test_baseline_verify_schema_incompatibilitiestests/test_baseline.py::test_baseline_verify_schema_incompatibilities[schema_major_mismatch]tests/test_baseline.py::test_baseline_verify_fingerprint_mismatchtests/test_cache.py::test_cache_v_field_version_mismatch_warnstests/test_report.py::test_report_json_compact_v21_contract
Non-guarantees¶
- Backward compatibility is not guaranteed across incompatible schema/fingerprint bumps.
- Health Score is not frozen forever as a mathematical formula; what is frozen is the obligation to document scoring-model changes and present them honestly.