Appendix B. Schema Layouts¶

Purpose¶

Compact structural layouts for baseline/cache/report contracts in the current 2.1 release line. Generator/package version in JSON examples is illustrative; the actual version is defined in codeclone/contracts/__init__.py and pyproject.toml.

Baseline schema (`2.1`)¶

{
  "meta": {
    "generator": {
      "name": "codeclone",
      "version": "2.0.2"
    },
    "schema_version": "2.1",
    "fingerprint_version": "1",
    "python_tag": "cp314",
    "created_at": "2026-03-11T00:00:00Z",
    "payload_sha256": "...",
    "metrics_payload_sha256": "...",
    "api_surface_payload_sha256": "..."
  },
  "clones": {
    "functions": [
      "<fingerprint>|<loc_bucket>"
    ],
    "blocks": [
      "<block_hash>|<block_hash>|<block_hash>|<block_hash>"
    ]
  },
  "metrics": {
    "...": "optional embedded metrics snapshot"
  },
  "api_surface": {
    "...": "optional embedded public API snapshot"
  }
}

Compact embedded api_surface symbol layout:

{
  "module": "pkg.mod",
  "filepath": "pkg/mod.py",
  "symbols": [
    {
      "local_name": "PublicClass.method",
      "kind": "method",
      "start_line": 10,
      "end_line": 14,
      "params": [],
      "returns_hash": "",
      "exported_via": "name"
    }
  ]
}

Notes:

local_name is stored on disk to avoid repeating the containing module path.
filepath is stored as a baseline-directory-relative wire path when possible, rather than as a machine-local absolute path.
Runtime reconstructs canonical full qualnames as module:local_name before API-surface diffing and restores runtime filepaths from the wire path.

Standalone metrics-baseline schema (`1.2`)¶

{
  "meta": {
    "generator": {
      "name": "codeclone",
      "version": "2.0.2"
    },
    "schema_version": "1.2",
    "python_tag": "cp314",
    "created_at": "2026-03-11T00:00:00Z",
    "payload_sha256": "...",
    "api_surface_payload_sha256": "..."
  },
  "metrics": {
    "...": "metrics snapshot"
  },
  "api_surface": {
    "modules": [
      {
        "module": "pkg.mod",
        "filepath": "pkg/mod.py",
        "all_declared": [],
        "symbols": [
          {
            "local_name": "run",
            "kind": "function",
            "start_line": 10,
            "end_line": 14,
            "params": [],
            "returns_hash": "",
            "exported_via": "name"
          }
        ]
      }
    ]
  }
}

Cache schema (`2.10`)¶

{
  "v": "2.10",
  "payload": {
    "py": "cp314",
    "fp": "1",
    "ap": {
      "min_loc": 10,
      "min_stmt": 6,
      "block_min_loc": 20,
      "block_min_stmt": 8,
      "segment_min_loc": 20,
      "segment_min_stmt": 10,
      "collect_api_surface": false
    },
    "files": {
      "codeclone/cache/store.py": {
        "st": [
          1730000000000000000,
          2048
        ],
        "ss": [
          450,
          12,
          3,
          1
        ],
        "u": [
          [
            "qualname",
            1,
            2,
            2,
            1,
            "fp",
            "0-19",
            1,
            0,
            "low",
            "raw_hash",
            0,
            "none",
            0,
            "fallthrough",
            "none",
            "none"
          ]
        ],
        "b": [
          [
            "qualname",
            10,
            14,
            5,
            "block_hash"
          ]
        ],
        "s": [
          [
            "qualname",
            10,
            14,
            5,
            "segment_hash",
            "segment_sig"
          ]
        ],
        "cm": [
          [
            "qualname",
            1,
            30,
            3,
            2,
            4,
            2,
            "low",
            "low"
          ]
        ],
        "cc": [
          [
            "qualname",
            [
              "pkg.a",
              "pkg.b"
            ]
          ]
        ],
        "md": [
          [
            "pkg.a",
            "pkg.b",
            "import",
            10
          ]
        ],
        "dc": [
          [
            "pkg.a:unused_fn",
            "unused_fn",
            20,
            24,
            "function"
          ]
        ],
        "rn": [
          "used_name"
        ],
        "rq": [
          "pkg.dep:used_name"
        ],
        "in": [
          "pkg.dep"
        ],
        "cn": [
          "ClassName"
        ],
        "rr": [
          [
            "pkg.api:list_items",
            20,
            24,
            "function",
            "fastapi",
            "registers_handler",
            "medium",
            "route decorator",
            "router.get",
            "pkg.api:router"
          ]
        ],
        "sc": [
          [
            "process_boundary",
            "subprocess_run",
            "pkg.runner",
            "pkg.runner:run",
            10,
            10,
            "callable",
            "exact_call",
            "call",
            "subprocess.run"
          ]
        ],
        "sf": [
          [
            "duplicated_branches",
            "key",
            [
              [
                "stmt_seq",
                "Expr,Return"
              ]
            ],
            [
              [
                "pkg.a:f",
                10,
                12
              ]
            ]
          ]
        ]
      }
    }
  },
  "sig": "..."
}

Notes:

File keys are wire paths (repo-relative when root is configured).
Optional sections are omitted when empty.
ss stores per-file source stats and is required for full cache-hit accounting in discovery.
rn/rq are optional and decode to empty arrays when absent.
rr stores runtime reachability facts used to keep dead-code behavior equivalent between cold and cached runs.
Cached public-API symbol payloads preserve declaration order for params; canonicalization must not rewrite callable signature order.
u row decoder accepts both legacy 11-column rows and canonical 17-column rows (legacy rows map new structural fields to neutral defaults).
fr (schema 2.9+) stores per-function relationship facts: caller qualname plus relationship rows (relation_kind, resolution_status, origin_lane, target_qualname, line, expression, resolution_rule). Facts are rebuildable and off the canonical report; schema 2.10 adds cross-file aggregation onto the analysis result and MCP run record.

Report schema (`2.11`)¶

{
  "report_schema_version": "2.11",
  "meta": {
    "codeclone_version": "2.0.2",
    "project_name": "codeclone",
    "scan_root": ".",
    "analysis_mode": "full",
    "report_mode": "full",
    "analysis_profile": {
      "min_loc": 10,
      "min_stmt": 6,
      "block_min_loc": 20,
      "block_min_stmt": 8,
      "segment_min_loc": 20,
      "segment_min_stmt": 10
    },
    "analysis_thresholds": {
      "design_findings": {
        "complexity": {
          "metric": "cyclomatic_complexity",
          "operator": ">",
          "value": 20
        },
        "coupling": {
          "metric": "cbo",
          "operator": ">",
          "value": 10
        },
        "cohesion": {
          "metric": "lcom4",
          "operator": ">=",
          "value": 4
        }
      }
    },
    "baseline": {
      "...": "..."
    },
    "cache": {
      "...": "..."
    },
    "metrics_baseline": {
      "...": "..."
    },
    "runtime": {
      "analysis_started_at_utc": "2026-03-11T08:36:29Z",
      "report_generated_at_utc": "2026-03-11T08:36:32Z"
    }
  },
  "inventory": {
    "files": {
      "...": "..."
    },
    "code": {
      "...": "..."
    },
    "file_registry": {
      "encoding": "relative_path",
      "items": []
    }
  },
  "findings": {
    "summary": {
      "...": "...",
      "suppressed": {
        "dead_code": 0,
        "clones": 1
      }
    },
    "groups": {
      "clones": {
        "functions": [],
        "blocks": [],
        "segments": [],
        "suppressed": {
          "functions": [
            {
              "...": "..."
            }
          ],
          "blocks": [],
          "segments": []
        }
      },
      "structural": {
        "groups": [
          {
            "kind": "duplicated_branches",
            "...": "..."
          },
          {
            "kind": "clone_guard_exit_divergence",
            "...": "..."
          },
          {
            "kind": "clone_cohort_drift",
            "...": "..."
          }
        ]
      },
      "dead_code": {
        "groups": []
      },
      "design": {
        "groups": []
      }
    }
  },
  "metrics": {
    "summary": {
      "...": "...",
      "dead_code": {
        "total": 0,
        "high_confidence": 0,
        "suppressed": 1
      },
      "overloaded_modules": {
        "total": 0,
        "candidates": 0,
        "population_status": "limited",
        "top_score": 0.0,
        "average_score": 0.0
      },
      "coverage_adoption": {
        "modules": 0,
        "params_total": 0,
        "params_annotated": 0,
        "param_permille": 0,
        "returns_total": 0,
        "returns_annotated": 0,
        "return_permille": 0,
        "public_symbol_total": 0,
        "public_symbol_documented": 0,
        "docstring_permille": 0,
        "typing_any_count": 0
      },
      "coverage_join": {
        "status": "ok",
        "source": "coverage.xml",
        "files": 0,
        "units": 0,
        "measured_units": 0,
        "overall_executable_lines": 0,
        "overall_covered_lines": 0,
        "overall_permille": 0,
        "missing_from_report_units": 0,
        "coverage_hotspots": 0,
        "scope_gap_hotspots": 0,
        "hotspot_threshold_percent": 50,
        "invalid_reason": null
      },
      "api_surface": {
        "enabled": false,
        "modules": 0,
        "public_symbols": 0,
        "added": 0,
        "breaking": 0,
        "strict_types": false
      },
      "security_surfaces": {
        "items": 0,
        "modules": 0,
        "exact_items": 0,
        "category_count": 0,
        "production": 0,
        "tests": 0,
        "fixtures": 0,
        "other": 0,
        "report_only": true
      }
    },
    "families": {
      "complexity": {},
      "coupling": {},
      "cohesion": {},
      "dependencies": {},
      "dead_code": {
        "summary": {
          "total": 0,
          "high_confidence": 0,
          "suppressed": 1
        },
        "items": [],
        "suppressed_items": [
          {
            "...": "..."
          }
        ],
        "runtime_reachability": {
          "summary": {
            "total": 0,
            "by_framework": {},
            "by_edge_kind": {},
            "by_confidence": {}
          },
          "items": []
        }
      },
      "overloaded_modules": {
        "summary": {
          "total": 0,
          "candidates": 0,
          "population_status": "limited",
          "top_score": 0.0,
          "average_score": 0.0
        },
        "detection": {
          "version": "1",
          "scope": "report_only",
          "strategy": "project_relative_composite"
        },
        "items": []
      },
      "coverage_adoption": {
        "summary": {
          "modules": 0,
          "params_total": 0,
          "params_annotated": 0,
          "param_permille": 0,
          "baseline_diff_available": false,
          "param_delta": 0,
          "returns_total": 0,
          "returns_annotated": 0,
          "return_permille": 0,
          "return_delta": 0,
          "public_symbol_total": 0,
          "public_symbol_documented": 0,
          "docstring_permille": 0,
          "docstring_delta": 0,
          "typing_any_count": 0
        },
        "items": []
      },
      "coverage_join": {
        "summary": {
          "status": "ok",
          "source": "coverage.xml",
          "files": 0,
          "units": 0,
          "measured_units": 0,
          "overall_executable_lines": 0,
          "overall_covered_lines": 0,
          "overall_permille": 0,
          "missing_from_report_units": 0,
          "coverage_hotspots": 0,
          "scope_gap_hotspots": 0,
          "hotspot_threshold_percent": 50,
          "invalid_reason": null
        },
        "items": []
      },
      "api_surface": {
        "summary": {
          "enabled": false,
          "baseline_diff_available": false,
          "modules": 0,
          "public_symbols": 0,
          "added": 0,
          "breaking": 0,
          "strict_types": false
        },
        "items": []
      },
      "security_surfaces": {
        "summary": {
          "items": 0,
          "modules": 0,
          "exact_items": 0,
          "category_count": 0,
          "categories": {},
          "by_source_kind": {
            "production": 0,
            "tests": 0,
            "fixtures": 0,
            "other": 0
          },
          "production": 0,
          "tests": 0,
          "fixtures": 0,
          "other": 0,
          "report_only": true
        },
        "items": []
      },
      "health": {}
    }
  },
  "derived": {
    "suggestions": [],
    "overview": {
      "families": {
        "clones": 0,
        "structural": 0,
        "dead_code": 0,
        "design": 0
      },
      "top_risks": [],
      "source_scope_breakdown": {
        "production": 0,
        "tests": 0,
        "fixtures": 0
      },
      "health_snapshot": {
        "score": 100,
        "grade": "A"
      },
      "directory_hotspots": {
        "...": "..."
      }
    },
    "hotlists": {
      "most_actionable_ids": [],
      "highest_spread_ids": [],
      "production_hotspot_ids": [],
      "test_fixture_hotspot_ids": []
    }
  },
  "integrity": {
    "canonicalization": {
      "version": "1",
      "scope": "canonical_only",
      "sections": [
        "report_schema_version",
        "meta",
        "inventory",
        "findings",
        "metrics"
      ]
    },
    "digest": {
      "verified": true,
      "algorithm": "sha256",
      "value": "..."
    }
  }
}

Markdown projection (`1.0`)¶

# CodeClone Report
- Markdown schema: 1.0
- Source report schema: 2.11
...
## Overview
## Inventory
## Findings Summary
## Top Risks
## Suggestions
## Findings
## Metrics
## Integrity

SARIF projection (`2.1.0`, profile `1.0`)¶

{
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "version": "2.1.0",
  "runs": [
    {
      "originalUriBaseIds": {
        "%SRCROOT%": {
          "uri": "file:///repo/project/",
          "description": {
            "text": "The root of the scanned source tree."
          }
        }
      },
      "tool": {
        "driver": {
          "name": "codeclone",
          "version": "2.0.2",
          "rules": [
            {
              "id": "CCLONE001",
              "name": "codeclone.CCLONE001",
              "shortDescription": {
                "text": "Function clone group"
              },
              "fullDescription": {
                "text": "Multiple functions share the same normalized function body."
              },
              "help": {
                "text": "...",
                "markdown": "..."
              },
              "defaultConfiguration": {
                "level": "warning"
              },
              "helpUri": "https://orenlab.github.io/codeclone/",
              "properties": {
                "category": "clone",
                "kind": "clone_group",
                "precision": "high",
                "tags": [
                  "clone",
                  "clone_group",
                  "high"
                ]
              }
            }
          ]
        }
      },
      "automationDetails": {
        "id": "codeclone/full/2026-03-11T08:36:32Z"
      },
      "artifacts": [
        {
          "location": {
            "uri": "codeclone/report/renderers/sarif.py",
            "uriBaseId": "%SRCROOT%"
          }
        }
      ],
      "invocations": [
        {
          "executionSuccessful": true,
          "startTimeUtc": "2026-03-11T08:36:29Z",
          "workingDirectory": {
            "uri": "file:///repo/project/"
          }
        }
      ],
      "properties": {
        "profileVersion": "1.0",
        "reportSchemaVersion": "2.11"
      },
      "results": [
        {
          "kind": "fail",
          "ruleId": "CCLONE001",
          "ruleIndex": 0,
          "baselineState": "new",
          "message": {
            "text": "Function clone group (Type-2), 2 occurrences across 2 files."
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "codeclone/report/renderers/sarif.py",
                  "uriBaseId": "%SRCROOT%",
                  "index": 0
                },
                "region": {
                  "startLine": 1,
                  "endLine": 10
                }
              },
              "logicalLocations": [
                {
                  "fullyQualifiedName": "codeclone.report.sarif:render_sarif_report_document"
                }
              ],
              "message": {
                "text": "Representative occurrence"
              }
            }
          ],
          "properties": {
            "primaryPath": "codeclone/report/renderers/sarif.py",
            "primaryQualname": "codeclone.report.sarif:render_sarif_report_document",
            "primaryRegion": "1:10"
          },
          "relatedLocations": [],
          "partialFingerprints": {
            "primaryLocationLineHash": "0123456789abcdef:1"
          }
        }
      ]
    }
  ]
}

TXT report sections¶

REPORT METADATA
INVENTORY
FINDINGS SUMMARY
METRICS SUMMARY
DERIVED OVERVIEW
SUGGESTIONS
FUNCTION CLONES (NEW)
FUNCTION CLONES (KNOWN)
BLOCK CLONES (NEW)
BLOCK CLONES (KNOWN)
SEGMENT CLONES (NEW)
SEGMENT CLONES (KNOWN)
STRUCTURAL FINDINGS
DEAD CODE FINDINGS
DESIGN FINDINGS
INTEGRITY

Engineering Memory schema (`1.7`)¶

SQLite database at .codeclone/memory/engineering_memory.sqlite3 (default). Schema version stored in memory_meta.schema_version.

Core tables:

Table	Role
`memory_records`	Typed statements with status, confidence, origin, payload
`memory_subjects`	Path/symbol/module links (`subject_kind`, `subject_key`)
`memory_evidence`	Deterministic evidence refs (report, git_commit, doc, …)
`memory_fts`	FTS5 search index (schema 1.1+)
`memory_revisions`	Governance audit trail
`memory_ingestion_runs`	Init/refresh run metadata
`memory_projection_jobs`	Coalesced trajectory/semantic/Experience jobs (schema 1.3+); `flush_claimed_by` flush-scheduling slot (schema 1.7+)

Trajectory tables (schema 1.2+ trajectory DDL, active projection trajectory-v3):

Table	Role
`memory_trajectories`	One row per `(project_id, workflow_id, projection_version)` with quality score
`memory_trajectory_steps`	Ordered audit steps with frozen `event_core_json`
`memory_trajectory_subjects`	Path/module subjects linked to a trajectory
`memory_trajectory_evidence`	Report/run/audit evidence refs
`memory_trajectory_patch_trails`	Patch Trail JSON + digest per trajectory (schema `1.4`, Phase 26)
`memory_trajectory_projection_runs`	Rebuild run manifest

Experience tables (schema 1.6, derived from trajectory evidence):

Table	Role
`memory_experiences`	Advisory distilled patterns (`experience-v1`)
`memory_experience_facets`	Agent-family facets today; profile/intent kinds are reserved
`memory_experience_evidence`	Contributing trajectory ids and outcomes

Patch Trail JSON uses PATCH_TRAIL_SCHEMA_VERSION (currently 1) in codeclone/contracts/__init__.py. Trajectory JSONL export rows use TRAJECTORY_EXPORT_SCHEMA_VERSION (2) in codeclone/memory/trajectory/profiles.py — separate from SQLite schema version.

Record identity uses stable identity_key strings for upsert during refresh. Migration path: codeclone/memory/schema_migrate.py.

See Engineering Memory for lifecycle and agent surfaces.

Semantic index sidecar (format `2`)¶

Optional LanceDB directory (default .codeclone/memory/semantic_index.lance). Format version constant: SEMANTIC_INDEX_FORMAT_VERSION in codeclone/contracts/__init__.py (currently 2).

Table columns (PyArrow):

Column	Type	Notes
`id`	string	Row id; chunk rows use `trajectory:{id}:chunk:NNN`
`source`	string	`memory` / `audit` / `trajectory`
`parent_id`	string (nullable)	Trajectory id for chunk rows; null for single-row
`chunk_index`	int32 (nullable)	Zero-based chunk index
`chunk_count`	int32 (nullable)	Total chunks for the parent trajectory
`project_id`	string
`subject_path`	string
`kind`	string
`status`	string
`text_hash`	string	Chunk text hash (idempotent upsert key)
`embedding_model`	string
`vector`	float32 list	Fixed embedding dimension

Trajectory projections longer than the embedding model window are split into deterministic token-aligned chunks (strategy version SEMANTIC_CHUNK_STRATEGY_VERSION in codeclone/memory/semantic/chunking.py). Single-chunk trajectories keep the trajectory id as id with null chunk fields. Retrieval collapses chunk hits to one score per parent trajectory.

Not governed by ENGINEERING_MEMORY_SCHEMA_VERSION — bumping memory SQLite schema does not automatically invalidate the vector sidecar.
Rebuild on incompatible format bumps (codeclone memory semantic rebuild); no SQLite migration path for the sidecar.
Row/projection semantics: Engineering Memory; bump rules: 24-compatibility-and-versioning.md.

Platform Observability schema (`1.1`)¶

Optional local SQLite database at .codeclone/db/platform_observability.sqlite3. It is disposable development telemetry, not report, baseline, cache, audit, or Engineering Memory truth.

Table	Role
`platform_meta`	Schema version metadata.
`platform_operations`	Surface-level operation identity, correlation, duration, status, bounded payload sizes, and optional process metrics.
`platform_spans`	Ordered subsystem timing, reason/dedupe metadata, counters, normalized SQL fingerprints, and optional process metrics.

Operation and span rows are persisted together in one transaction. Profile columns are nullable and populated only when profiling is enabled with codeclone[perf]. db_fingerprints is additively migrated for older local stores.

See Platform Observability for configuration, privacy, query, and anti-inference rules.

Corpus analytics store (`1.2`)¶

Optional SQLite database (default .codeclone/analytics/corpus_clustering.sqlite3) and LanceDB vector directory (default .codeclone/analytics/corpus_vectors). Derived offline analytics — not report, baseline, cache, audit, or Engineering Memory truth.

Artifact	Role
`corpus_snapshots`	Immutable-by-contract snapshot metadata and source digests
`corpus_items`	Normalized representation, metadata, and optional registry overlay
`embedding_generations`	Provider/model/preprocessing manifest
`embedding_items`	Vector row keys, float32 digests, dimensions; no vector blobs
`clustering_runs`	Requested/effective parameters, algorithm manifest, lifecycle status
`cluster_assignments`	Per-run item label, strength, and membership digest
`cluster_summaries`	Canonical display id and persisted diagnostics per cluster/noise
`profile_manifest_snapshots`	Immutable canonical manifest values, labels, and descriptions
`profile_batches`	One immutable execution receipt per profile sweep
`profile_batch_runs`	Ordered effective-parameter membership for each batch
`profile_assessments`	Technical-validity-aware suitability facts for batch members
`run_selections`	Append-only global or profile-batch maintainer decisions
LanceDB sidecar	Separate float32 vectors from Engineering Memory semantic index

Store schema version: CORPUS_ANALYTICS_STORE_SCHEMA_VERSION in codeclone/contracts/__init__.py (currently 1.2).

Writable open chains 1.0 → 1.1 → 1.2; it never skips the intermediate integrity migration. Read-only open never migrates and rejects stale schema. SQLite triggers prevent orphan-producing inserts, relationship updates, and parent deletes; unique indexes protect vector row keys, non-null display cluster ids, and one effective candidate per profile batch.

Profile state is an overlay over immutable clustering facts:

erDiagram
    CORPUS_SNAPSHOTS ||--o{ CLUSTERING_RUNS : owns
    EMBEDDING_GENERATIONS ||--o{ CLUSTERING_RUNS : supplies
    PROFILE_MANIFEST_SNAPSHOTS ||--o{ PROFILE_BATCHES : fixes
    PROFILE_BATCHES ||--o{ PROFILE_BATCH_RUNS : contains
    CLUSTERING_RUNS ||--o{ PROFILE_BATCH_RUNS : participates
    PROFILE_BATCHES ||--o{ PROFILE_ASSESSMENTS : assesses
    CLUSTERING_RUNS ||--o{ PROFILE_ASSESSMENTS : receives
    PROFILE_BATCHES o|--o{ RUN_SELECTIONS : scopes
    CLUSTERING_RUNS ||--o{ RUN_SELECTIONS : selects

clustering_runs has no profile columns. A re-sweep creates a new profile_batches row with the exact manifest and candidate-space digests. run_selections supersedes earlier active heads in the same (snapshot_id, embedding_generation_id, profile_batch_id) scope. A null batch means global selection. The legacy selected_by_maintainer field is a global-scope mirror only.

The SQLite transaction and LanceDB sidecar cannot share one physical transaction. The embedding workflow therefore writes metadata and vectors as one controlled operation, rolls SQLite back and removes the generation on ordinary failures, and validates row keys, dimensions, and float32 digests before clustering. Crash residue is detected as an integrity error rather than accepted as a completed generation.

Corpus analytics JSON export (`1.3`)¶

CORPUS_EXPORT_SCHEMA_VERSION = "1.3" projects interpretation contract 1.1 and control-plane contract 1.0 over store schema 1.2; it does not migrate the SQLite database.

export
├── schema_version = "1.3"
├── interpretation_contract_version = "1.1"
├── control_plane_contract_version = "1.0"
├── snapshot
├── embedding_generation | null
├── embedding_items[]
├── clustering_run
│   ├── validity
│   ├── presentation
│   ├── profile_context?       # stored batch assessment + manifest snapshot
│   ├── selection?             # active event for export scope
│   └── partition_metrics | diagnostic_facts
├── clusters[]                 # full mode only
│   └── interpretation
│       ├── representative_previews[]
│       ├── boundary_previews[]
│       ├── categorical_correlations
│       ├── numeric_summaries
│       ├── provenance_completeness
│       └── machine_inspectability_signals
├── assignments[]              # full mode only
├── noise_items[]              # full mode only
├── profile_summary?           # sweep export, sibling of comparison_summary
└── content_disclosure

clustering_run.validity.failed_invariants is a deterministic ordered subset of V1 through V10. presentation.projection_mode is full_interpretation only when every invariant passes; otherwise it is limited_diagnostic, partition_metrics is omitted, score is null, and cluster/item interpretation arrays are absent.

Sweep comparison exports every persisted run for the requested snapshot and embedding generation. Each candidate has a sibling comparison object:

{
  "score": null,
  "rank": null,
  "recommended_by_heuristic": false,
  "dominant_cluster_ratio": null,
  "dominant_assigned_ratio": null,
  "largest_cluster_size": null
}

Only technically valid candidates receive non-null comparison metrics, score, and rank. Profile-scoped candidates add profile_suitable and is_profile_recommended. comparison_summary preserves its Slice 1.1 keys; profile_summary is a sibling with batch identity, manifest snapshot metadata, suitability counts, recommendation rationale, and active selection.

content_disclosure is derived from the final payload. Its preview scopes are cluster_representatives, cluster_boundaries, and noise_items; the contract limit is 240 Unicode code points. Default items[] entries do not contain normalized text previews.

Corpus representation contract (`3`)¶

New intent snapshots persist explicit provenance booleans inside corpus_items.metadata_json:

{
  "provenance": {
    "trajectory": {"selected": false},
    "patch_trail": {"present": false},
    "registry_overlay": {"present": false}
  }
}

Trajectory and Patch Trail identity evidence retains its existing digest behavior. Registry-overlay content and presence remain advisory and excluded from source identity. Contract-2 snapshots are immutable and are interpreted with conservative legacy rules; they are not rewritten.

See Corpus Analytics for CLI, configuration, and trust boundaries, and Profile Control Plane for batch and selection semantics, and Report Interpretability for validity and privacy rules.

Subsystem-local version constants¶

Not defined in codeclone/contracts/__init__.py; bump in the owning module.

Constant	Value	Owner
`AUDIT_EVENT_CORE_VERSION`	`2`	`codeclone/audit/events.py`
`CONTEXT_CONTRACT_VERSION`	`1`	`codeclone/surfaces/mcp/_implementation_context.py`
`CALL_RESOLUTION_VERSION`	`1`	`codeclone/surfaces/mcp/_implementation_context.py`
`TRAJECTORY_EXPORT_SCHEMA_VERSION`	`2`	`codeclone/memory/trajectory/profiles.py`

Central corpus and governance constants (also in codeclone/contracts/__init__.py):

Constant	Value
`IDE_GOVERNANCE_PROTOCOL_VERSION`	`2`
`CORPUS_ANALYTICS_STORE_SCHEMA_VERSION`	`1.2`
`CORPUS_EXPORT_SCHEMA_VERSION`	`1.3`
`CORPUS_PROFILE_MANIFEST_SCHEMA_VERSION`	`1`
`CORPUS_CONTROL_PLANE_CONTRACT_VERSION`	`1.0`
`CORPUS_REPRESENTATION_CONTRACT_VERSION`	`3`
`CORPUS_NORMALIZER_VERSION`	`1`
`CORPUS_EMBEDDING_CONTRACT_VERSION`	`2`
`CORPUS_AGENT_LABEL_CONTRACT_VERSION`	`1`
`CORPUS_PARTITION_MAP_VERSION`	`1`

Refs¶

codeclone/baseline/clone_baseline.py
codeclone/cache/store.py
codeclone/memory/schema.py
codeclone/memory/schema_trajectory.py
codeclone/memory/schema_migrate.py
codeclone/memory/semantic/models.py
codeclone/observability/store/schema.py
codeclone/contracts/__init__.py
codeclone/audit/events.py
codeclone/surfaces/mcp/_implementation_context.py
codeclone/report/document/builder.py
codeclone/report/renderers/text.py
codeclone/report/renderers/markdown.py
codeclone/report/renderers/sarif.py

Appendix B. Schema Layouts¶

Purpose¶

Baseline schema (2.1)¶

Standalone metrics-baseline schema (1.2)¶

Cache schema (2.10)¶

Report schema (2.11)¶

Markdown projection (1.0)¶

SARIF projection (2.1.0, profile 1.0)¶