01. Architecture Map¶
Purpose¶
Document current module boundaries and ownership in CodeClone v2.x.
Public surface¶
Main ownership layers:
- Core detection pipeline:
scanner->extractor->cfg/normalize/blocks->grouping. - Quality metrics pipeline: complexity/coupling/cohesion/dependencies/dead-code/health.
- Contracts and persistence: baseline, metrics baseline, cache, exit semantics.
- Report model and projections: canonical JSON + deterministic TXT/Markdown/SARIF + explainability facts.
- MCP agent surface: read-only server layer over the same pipeline/report contracts.
- VS Code extension surface: native IDE client over the MCP layer and the same canonical report semantics, with
limited Restricted Mode, source-first review flow, and factual overview surfaces such as
Coverage Joinwhen MCP exposes them. - Claude Desktop bundle surface: installable local
.mcpbwrapper that launches the samecodeclone-mcpserver for Claude Desktop without introducing a second MCP or analysis layer. - Codex plugin surface: repo-local Codex plugin under
plugins/and.agents/plugins/marketplace.jsonthat adds native plugin discovery, a local MCP definition, and a CodeClone review skill over the same server. - Render layer: HTML rendering and template assets.
Data model¶
| Layer | Modules | Responsibility |
|---|---|---|
| Contracts | codeclone/contracts.py, codeclone/errors.py |
Shared schema versions, URLs, exit-code enum, typed exceptions |
| Domain models | codeclone/models.py, codeclone/domain/*.py |
Typed dataclasses/enums plus centralized finding/scope/severity taxonomies |
| Discovery + parsing | codeclone/scanner.py, codeclone/extractor.py |
Enumerate files, parse AST, extract function/block/segment units |
| Structural analysis | codeclone/cfg.py, codeclone/normalize.py, codeclone/fingerprint.py, codeclone/blocks.py |
CFG, normalization, statement hashes, block/segment windows |
| Grouping | codeclone/grouping.py |
Build function/block/segment groups |
| Metrics | codeclone/metrics/* |
Compute complexity/coupling/cohesion/dependency/dead-code/health signals |
| Report core | codeclone/report/*, codeclone/_cli_meta.py |
Canonical report building, deterministic projections, explainability facts, and shared metadata |
| Persistence | codeclone/baseline.py, codeclone/metrics_baseline.py, codeclone/cache.py |
Baseline/cache trust/compat/integrity and atomic persistence |
| Runtime orchestration | codeclone/pipeline.py, codeclone/cli.py, codeclone/_cli_args.py, codeclone/_cli_paths.py, codeclone/_cli_summary.py, codeclone/_cli_config.py, codeclone/ui_messages.py |
CLI UX, stage orchestration, status handling, outputs, error markers |
| MCP agent interface | codeclone/mcp_service.py, codeclone/mcp_server.py |
Read-only MCP tools/resources over canonical analysis and report layers |
| VS Code extension | extensions/vscode-codeclone/* |
Native VS Code control surface over MCP, with limited Restricted Mode, triage-first review, and source-first drill-down |
| Claude Desktop bundle | extensions/claude-desktop-codeclone/* |
Installable local MCPB wrapper over codeclone-mcp, keeping Claude Desktop on the canonical read-only MCP surface |
| Codex plugin | plugins/codeclone/*, .agents/plugins/marketplace.json |
Native Codex plugin surface over codeclone-mcp, with repo-local discovery metadata and CodeClone skill guidance |
| Rendering | codeclone/html_report.py, codeclone/_html_report/*, codeclone/_html_badges.py, codeclone/_html_js.py, codeclone/_html_escape.py, codeclone/_html_snippets.py, codeclone/templates.py |
HTML-only view layer over report data |
Refs:
codeclone/pipeline.pycodeclone/cli.py:_main_impl
Contracts¶
- Core analysis modules do not depend on render/UI modules.
- HTML renderer receives already-computed report data/facts and does not recompute detection semantics.
- MCP layer reuses current pipeline/report semantics and must not introduce a separate analysis truth path.
- The VS Code extension follows the same rule through MCP: it is a client integration surface over canonical report semantics, not a separate analyzer.
- The Claude Desktop bundle follows the same rule: it is a local installation
and launcher surface over
codeclone-mcp, not a second server. - The Codex plugin follows the same rule: it is a local discovery and skills
surface over
codeclone-mcp, not a second analyzer or report model. - MCP may ship task-specific slim projections (for example, summary-only metrics or inventory counts) as long as canonical report data remains the source of truth and richer detail stays reachable through dedicated tools/sections.
- The same rule applies to bounded semantic routing tools such as
help(topic=...): they explain contract meaning and route agents to the safest next step, but they do not introduce a second documentation or truth model. - The same rule applies to summary cache convenience fields such as
freshnessand to production-first triage projections built from canonical hotlists/suggestions. - The same rule also applies to compact interpretation hints such as
health_scope,focus, andnew_by_source_kind: they clarify projection meaning without introducing a second report truth. - MCP finding lists may also expose short run/finding ids and slimmer relative
location projections, while keeping
get_finding(detail_level="full")as the richer per-finding inspection path. - Baseline, metrics baseline, and cache are validated before being trusted.
Refs:
codeclone/report/json_contract.py:build_report_documentcodeclone/html_report.py:build_html_reportcodeclone/baseline.py:Baseline.loadcodeclone/metrics_baseline.py:MetricsBaseline.loadcodeclone/cache.py:Cache.load
Invariants (MUST)¶
- Report serialization is deterministic and schema-versioned.
- UI is render-only and must not change gating semantics.
- Status enums remain domain-owned in baseline/metrics-baseline/cache modules.
Refs:
codeclone/report/json_contract.py:build_report_documentcodeclone/report/explain.py:build_block_group_factscodeclone/baseline.py:BaselineStatuscodeclone/metrics_baseline.py:MetricsBaselineStatuscodeclone/cache.py:CacheStatus
Failure modes¶
| Condition | Layer |
|---|---|
| Invalid CLI args / invalid output path | Runtime orchestration (_cli_args, _cli_paths) |
| Baseline schema/integrity mismatch | Baseline contract layer |
| Metrics baseline schema/integrity mismatch | Metrics baseline contract layer |
| Cache corruption/version mismatch | Cache contract layer (fail-open) |
| HTML snippet read failure | Render layer fallback snippet |
Determinism / canonicalization¶
- File iteration and group key ordering are explicit sorts.
- Report serializer uses fixed record layouts and sorted keys.
Refs:
codeclone/scanner.py:iter_py_filescodeclone/report/json_contract.py:build_report_document
Locked by tests¶
tests/test_report.py::test_report_json_compact_v21_contracttests/test_html_report.py::test_html_report_uses_core_block_group_factstests/test_cache.py::test_cache_v13_uses_relpaths_when_root_settests/test_cli_unit.py::test_argument_parser_contract_error_marker_for_invalid_argstests/test_architecture.py::test_architecture_layer_violations
Non-guarantees¶
- Internal module split may evolve in v2.x if public contracts are preserved.
- Import tree acyclicity is policy and test-enforced where explicitly asserted.
Chapter map¶
| Topic | Primary chapters |
|---|---|
| CLI behavior and failure routing | 03-contracts-exit-codes.md, 09-cli.md |
| Config precedence and defaults | 04-config-and-defaults.md |
| Core processing pipeline | 05-core-pipeline.md |
| Clone baseline trust/compat/integrity | 06-baseline.md |
| Cache trust and fail-open behavior | 07-cache.md |
| Report schema and provenance | 08-report.md, 10-html-render.md |
| MCP agent surface | 20-mcp-interface.md |
| VS Code IDE surface | 21-vscode-extension.md |
| Claude Desktop install surface | 22-claude-desktop-bundle.md |
| Codex plugin surface | 23-codex-plugin.md |
| Health score model | 15-health-score.md |
| Metrics gates and metrics baseline | 15-metrics-and-quality-gates.md |
| Dead-code liveness policy | 16-dead-code-contract.md |
| Suggestions and clone typing | 17-suggestions-and-clone-typing.md |
| Determinism and versioning policy | 12-determinism.md, 14-compatibility-and-versioning.md |