08. Report¶
Purpose¶
Define report contracts in 2.0.0b5: canonical JSON (report_schema_version=2.8)
plus deterministic TXT/Markdown/SARIF projections.
Public surface¶
- Canonical report builder:
codeclone/report/json_contract.py:build_report_document - JSON/TXT renderers:
codeclone/report/serialize.py - Markdown renderer:
codeclone/report/markdown.py - SARIF renderer:
codeclone/report/sarif.py - HTML renderer:
codeclone/html_report.py:build_html_report - Shared metadata source:
codeclone/_cli_meta.py:_build_report_meta
Data model¶
JSON report top-level (v2.8):
report_schema_versionmetainventoryfindingsmetricsderivedintegrity
Canonical provenance additions:
meta.analysis_profilerecords the effective runtime clone, block, and segment thresholds for that run (min_loc,min_stmt,block_*,segment_*).meta.analysis_thresholds.design_findingsrecords the effective report-level thresholds used to materialize canonical design findings for that run (complexity > N,coupling > N,cohesion >= N).
Canonical report-only metrics additions:
metrics.families.overloaded_modulesrecords project-relative module hotspot profiles and candidate classification forOverloaded Modulesmetrics.families.coverage_adoptionrecords parameter coverage, return coverage, public docstring coverage, andAnyusage counts, plus compact baseline deltas when a trusted metrics baseline is availablemetrics.families.api_surfacerecords the current public symbol inventory and compact baseline diff facts (added,breaking) when--api-surfaceis enabledmetrics.families.coverage_joinrecords an optional current-run join between external Cobertura line coverage and CodeClone function spans. Its summary carriesstatus,source, unit/line counts,overall_permille,missing_from_report_units,coverage_hotspots,scope_gap_hotspots,hotspot_threshold_percent, and optionalinvalid_reason; the same compact summary is mirrored inmetrics.summary.coverage_join; its items carry per-function joined coverage facts, includingcoverage_status,coverage_hotspot, andscope_gap_hotspot.- coverage join facts are canonical report truth for that run, but they are
not baseline truth and do not update
codeclone.baseline.json - adoption/API/coverage-join metrics do not participate in clone baseline NEW/KNOWN semantics; coverage join also does not participate in health scoring and gates only when explicitly requested
Overloaded Modulesis a report-only experimental layer rather than a second complexity metric:- complexity reports local control-flow hotspots in functions and methods
Overloaded Modulesreports module-level responsibility overload and dependency pressure- the layer may later become scoring only after validation and explicit health-model documentation updates
Coverage/API role split:
coverage_adoptionis a canonical metrics family, not a style linter. It reports observable adoption facts only.coverage_joinis a canonical current-run signal over an external Cobertura XML file. It reports joined line facts and may materializedesignfindings withcategory="coverage"and kindscoverage_hotspot(measured below threshold) orcoverage_scope_gap(outside the supplied coverage scope); it does not infer branch coverage or execute tests.api_surfaceis a canonical metrics/gating family, not a second finding engine. It reports public API inventory plus baseline-diff facts when the run opted into API collection.
Canonical vs non-canonical split:
- Canonical:
report_schema_version,meta,inventory,findings,metrics - Non-canonical projection layer:
derived - Integrity metadata:
integrity(canonicalization+digest)
Derived projection layer:
derived.suggestions[*]— action-surplus projection cards keyed back to canonical findings viafinding_idderived.overview— summary-only overview facts:familiestop_riskssource_scope_breakdownhealth_snapshotdirectory_hotspots
derived.hotlists— deterministic lists of canonical finding IDs:most_actionable_idshighest_spread_idsproduction_hotspot_idstest_fixture_hotspot_ids
Finding families:
findings.groups.clones.{functions,blocks,segments}- optional
findings.groups.clones.suppressed.{functions,blocks,segments}for clone groups excluded by project policy such asgolden_fixture_paths findings.groups.structural.groupsfindings.groups.dead_code.groupsfindings.groups.design.groupsfindings.summary.suppressed.dead_code(suppressed counter, non-active findings)- optional
findings.summary.suppressed.clonesplus clone-summary suppressed counters when clone groups were excluded from active findings
Important role split:
- Findings explain what was detected.
- Suggestions exist only when they add action structure on top of a finding (next step, prioritization, effort/risk framing, grouped remediation, or review relevance).
- Low-signal local structural info hints may remain findings-only and not appear as separate suggestion cards.
Structural finding kinds currently emitted by core/report pipeline:
duplicated_branchesclone_guard_exit_divergenceclone_cohort_drift
Per-group common axes (family-specific fields may extend):
- identity:
id,family,category,kind - assessment:
severity,confidence,priority - scope:
source_scope(dominant_kind,breakdown,impact_scope) - spread:
spread.files,spread.functions - evidence:
items,facts(+ optionaldisplay_facts)
Contracts¶
- JSON is source of truth for report semantics.
- Markdown and SARIF are deterministic projections from the same report document.
- MCP summary/finding/hotlist/report-section queries are deterministic views over the same canonical report document.
- SARIF is an IDE/code-scanning-oriented projection:
- repo-relative result paths are anchored via
%SRCROOT% - referenced files are listed under
run.artifacts - clone results carry
baselineStatewhen clone novelty is known
- repo-relative result paths are anchored via
- Derived layer (
suggestions,overview,hotlists) does not replace canonical findings/metrics. - Design findings are built once in the canonical report using the effective
threshold policy recorded in
meta.analysis_thresholds.design_findings; MCP and HTML must not re-synthesize them post-hoc from raw metric rows. - Coverage design findings are built from canonical
coverage_joinrows only when a valid join is present. Invalid coverage input is represented asmetrics.families.coverage_join.summary.status="invalid"with no hotspot item rows. - HTML overview cards are materialized from canonical findings plus
derived.overview+derived.hotlists; pre-expanded overview card payloads are not part of the report contract. derived.overview.directory_hotspotsis a deterministic report-layer aggregation over canonical findings; HTML must render it as-is or omit it on compatibility paths without a canonical report document.derived.overview.health_snapshotis a projection over canonicalmetrics.families.health.summary; it summarizes the current score but does not define a second health model.derived.overview.directory_hotspots[*].pathis an overview-oriented directory key: runtime findings keep their parent directory, while test-only and fixture-only findings collapse to the corresponding source-scope roots (.../testsor.../tests/fixtures) to avoid duplicating the same hotspot across leaf fixture paths.- Overview hotspot/source-breakdown sections must resolve from canonical report
data or deterministic derived IDs; HTML must not silently substitute stale
placeholders such as
n/aor empty-state cards when canonical data exists. analysis_started_at_utcandreport_generated_at_utcare carried inmeta.runtime; renderers/projections may use them for provenance but must not reinterpret them as semantic analysis data.- Canonical
meta.scan_rootis normalized to"."; absolute runtime paths are exposed undermeta.runtime.*_absolute. clone_typeandnoveltyare group-level properties inside clone groups.- Cohort-drift structural families are report-only and must not affect baseline diff or CI gating decisions.
- Dead-code suppressed candidates are carried only under metrics
(
metrics.families.dead_code.suppressed_items) and never promoted to activefindings.groups.dead_code. - Clone groups excluded by
golden_fixture_pathsare carried only underfindings.groups.clones.suppressed.*; they do not contribute to active findings totals, health scoring, clone gating, or suggestion generation. - A lower score after upgrade may reflect a broader health model, not only worse code. Report renderers may surface the score, but health-model expansion is documented separately in 15-health-score.md and compatibility notes.
Invariants (MUST)¶
- Stable ordering for groups/items/suggestions/hotlists.
- Stable ordering for SARIF rules, artifacts, and results.
derived.suggestions[*].finding_idreferences existing canonical finding IDs.derived.hotlists.*_idsreference existing canonical finding IDs.- SARIF
artifacts[*]andlocations[*].artifactLocation.indexstay aligned. integrity.digestis computed from canonical sections only (derived excluded).source_scope.impact_scopeis explicit and deterministic (runtime,non_runtime,mixed).
Failure modes¶
| Condition | Behavior |
|---|---|
| Missing optional UI/meta fields | Renderer falls back to empty/(none) display |
| Untrusted baseline | Clone novelty resolves to new for all groups |
| Missing snippet source in HTML | Safe fallback snippet block |
Determinism / canonicalization¶
- Canonical payload is serialized with sorted keys for digest computation.
- Inventory file registry is normalized to relative paths.
- Structural findings are normalized, deduplicated, and sorted before serialization.
Refs:
codeclone/report/json_contract.py:_build_integrity_payloadcodeclone/report/json_contract.py:_build_inventory_payloadcodeclone/structural_findings.py:normalize_structural_findings
Locked by tests¶
tests/test_report.py::test_report_json_compact_v21_contracttests/test_report.py::test_report_json_integrity_matches_canonical_sectionstests/test_report.py::test_report_json_integrity_ignores_derived_changestests/test_report_contract_coverage.py::test_report_document_rich_invariants_and_rendererstests/test_report_contract_coverage.py::test_markdown_and_sarif_reuse_prebuilt_report_documenttests/test_report_branch_invariants.py::test_overview_and_sarif_branch_invariantstests/test_report.py::test_json_includes_clone_guard_exit_divergence_structural_grouptests/test_report.py::test_json_includes_clone_cohort_drift_structural_grouptests/test_report.py::test_report_json_dead_code_suppressed_items_are_reported_separately
Non-guarantees¶
- Human-readable wording in
derivedor HTML may evolve without schema bump. - CSS/layout changes are not part of JSON contract.