08. Report¶
Purpose¶
Define report contracts in 2.0.0b1: canonical JSON (report_schema_version=2.1)
plus deterministic TXT/Markdown/SARIF projections.
Public surface¶
- Canonical report builder:
codeclone/report/json_contract.py:build_report_document - JSON/TXT renderers:
codeclone/report/serialize.py - Markdown renderer:
codeclone/report/markdown.py - SARIF renderer:
codeclone/report/sarif.py - HTML renderer:
codeclone/html_report.py:build_html_report - Shared metadata source:
codeclone/_cli_meta.py:_build_report_meta
Data model¶
JSON report top-level (v2.1):
report_schema_versionmetainventoryfindingsmetricsderivedintegrity
Canonical vs non-canonical split:
- Canonical:
report_schema_version,meta,inventory,findings,metrics - Non-canonical projection layer:
derived - Integrity metadata:
integrity(canonicalization+digest)
Derived projection layer:
derived.suggestions[*]— actionable projection cards keyed back to canonical findings viafinding_idderived.overview— summary-only overview facts:familiestop_riskssource_scope_breakdownhealth_snapshot
derived.hotlists— deterministic lists of canonical finding IDs:most_actionable_idshighest_spread_idsproduction_hotspot_idstest_fixture_hotspot_ids
Finding families:
findings.groups.clones.{functions,blocks,segments}findings.groups.structural.groupsfindings.groups.dead_code.groupsfindings.groups.design.groupsfindings.summary.suppressed.dead_code(suppressed counter, non-active findings)
Structural finding kinds currently emitted by core/report pipeline:
duplicated_branchesclone_guard_exit_divergenceclone_cohort_drift
Per-group common axes (family-specific fields may extend):
- identity:
id,family,category,kind - assessment:
severity,confidence,priority - scope:
source_scope(dominant_kind,breakdown,impact_scope) - spread:
spread.files,spread.functions - evidence:
items,facts(+ optionaldisplay_facts)
Contracts¶
- JSON is source of truth for report semantics.
- Markdown and SARIF are deterministic projections from the same report document.
- SARIF is an IDE/code-scanning-oriented projection:
- repo-relative result paths are anchored via
%SRCROOT% - referenced files are listed under
run.artifacts - clone results carry
baselineStatewhen clone novelty is known
- repo-relative result paths are anchored via
- Derived layer (
suggestions,overview,hotlists) does not replace canonical findings/metrics. - HTML overview cards are materialized from canonical findings plus
derived.overview+derived.hotlists; pre-expanded overview card payloads are not part of the report contract. - Overview hotspot/source-breakdown sections must resolve from canonical report
data or deterministic derived IDs; HTML must not silently substitute stale
placeholders such as
n/aor empty-state cards when canonical data exists. report_generated_at_utcis carried inmeta.runtimeand reused by UI/renderers.- Canonical
meta.scan_rootis normalized to"."; absolute runtime paths are exposed undermeta.runtime.*_absolute. clone_typeandnoveltyare group-level properties inside clone groups.- Cohort-drift structural families are report-only and must not affect baseline diff or CI gating decisions.
- Dead-code suppressed candidates are carried only under metrics
(
metrics.families.dead_code.suppressed_items) and never promoted to activefindings.groups.dead_code.
Invariants (MUST)¶
- Stable ordering for groups/items/suggestions/hotlists.
- Stable ordering for SARIF rules, artifacts, and results.
derived.suggestions[*].finding_idreferences existing canonical finding IDs.derived.hotlists.*_idsreference existing canonical finding IDs.- SARIF
artifacts[*]andlocations[*].artifactLocation.indexstay aligned. integrity.digestis computed from canonical sections only (derived excluded).source_scope.impact_scopeis explicit and deterministic (runtime,non_runtime,mixed).
Failure modes¶
| Condition | Behavior |
|---|---|
| Missing optional UI/meta fields | Renderer falls back to empty/(none) display |
| Untrusted baseline | Clone novelty resolves to new for all groups |
| Missing snippet source in HTML | Safe fallback snippet block |
Determinism / canonicalization¶
- Canonical payload is serialized with sorted keys for digest computation.
- Inventory file registry is normalized to relative paths.
- Structural findings are normalized, deduplicated, and sorted before serialization.
Refs:
codeclone/report/json_contract.py:_build_integrity_payloadcodeclone/report/json_contract.py:_build_inventory_payloadcodeclone/structural_findings.py:normalize_structural_findings
Locked by tests¶
tests/test_report.py::test_report_json_compact_v21_contracttests/test_report.py::test_report_json_integrity_matches_canonical_sectionstests/test_report.py::test_report_json_integrity_ignores_derived_changestests/test_report_contract_coverage.py::test_report_document_rich_invariants_and_rendererstests/test_report_contract_coverage.py::test_markdown_and_sarif_reuse_prebuilt_report_documenttests/test_report_branch_invariants.py::test_overview_and_sarif_branch_invariantstests/test_report.py::test_json_includes_clone_guard_exit_divergence_structural_grouptests/test_report.py::test_json_includes_clone_cohort_drift_structural_grouptests/test_report.py::test_report_json_dead_code_suppressed_items_are_reported_separately
Non-guarantees¶
- Human-readable wording in
derivedor HTML may evolve without schema bump. - CSS/layout changes are not part of JSON contract.