02. Terminology¶
Purpose¶
Define terms exactly as used by code and tests.
Public surface¶
- Baseline identifiers and statuses:
codeclone/baseline.py - Cache statuses and compact layout:
codeclone/cache.py - Report schema and group layouts:
codeclone/report/json_contract.py
Data model¶
- fingerprint: function-level CFG fingerprint (
sha1) + LOC bucket key. - block_hash: ordered sequence of normalized statement hashes in a fixed window.
- segment_hash: hash of ordered segment window.
- segment_sig: hash of sorted segment window (candidate grouping signature).
- stable structure facts: per-function deterministic structure profile fields
(
entry_guard_*,terminal_kind,try_finally_profile,side_effect_order_profile) reused by report families. - cohort structural findings: report-only structural families derived from
existing function-clone groups (
clone_guard_exit_divergence,clone_cohort_drift). - python_tag: runtime compatibility tag like
cp313. - schema_version:
- baseline schema (
meta.schema_version) for baseline compatibility. - cache schema (
v) for cache compatibility. - report schema (
report_schema_version) for report format compatibility.
- baseline schema (
- payload_sha256: canonical baseline semantic hash.
- trusted baseline: baseline loaded + status
ok.
Refs:
codeclone/grouping.py:build_groupscodeclone/blocks.py:extract_blockscodeclone/blocks.py:extract_segmentscodeclone/baseline.py:current_python_tagcodeclone/baseline.py:Baseline.verify_compatibility
Contracts¶
- New/known classification is key-based, not item-heuristic-based.
- Baseline trust is status-driven.
- Cache trust is status-driven and independent from baseline trust.
Refs:
codeclone/report/json_contract.py:build_report_documentcodeclone/cli.py:_main_impl
Invariants (MUST)¶
- Function group key format:
fingerprint|loc_bucket. - Block group key format:
block_hash. - Segment group key format:
segment_hash|qualname(internal/report-only grouping path).
Refs:
codeclone/grouping.py:build_groupscodeclone/grouping.py:build_block_groupscodeclone/grouping.py:build_segment_groups
Failure modes¶
| Condition | Result |
|---|---|
Baseline generator name != codeclone |
generator_mismatch |
| Baseline python tag mismatch | mismatch_python_version |
| Cache signature mismatch | integrity_failed cache status |
Refs:
codeclone/baseline.py:Baseline.verify_compatibilitycodeclone/cache.py:Cache.load
Determinism / canonicalization¶
- Baseline clone ID lists must be sorted and unique.
- Cache compact arrays are sorted by deterministic tuple keys before write.
Refs:
codeclone/baseline.py:_require_sorted_unique_idscodeclone/cache.py:_encode_wire_file_entry
Locked by tests¶
tests/test_baseline.py::test_baseline_id_lists_must_be_sorted_and_uniquetests/test_report.py::test_report_json_group_order_is_lexicographictests/test_cache.py::test_cache_version_mismatch_warns
Non-guarantees¶
- Exact wording of status descriptions in UI is not a schema contract.