20. MCP Interface¶
Purpose¶
Define the current public MCP surface in the 2.0 beta line.
This interface is optional and is installed via the mcp extra. It does
not replace the CLI or the canonical JSON report contract. Instead, it exposes
the existing deterministic analysis pipeline as a read-only MCP server for
AI agents and MCP-capable clients.
It is intentionally budget-aware and triage-first: the MCP surface is shaped as
guided control flow for agentic development, not as a flat dump of report data.
Public surface¶
- Package extra:
codeclone[mcp] - MCP launcher:
codeclone-mcp - MCP server:
codeclone/mcp_server.py - MCP service adapter:
codeclone/mcp_service.py
Data model¶
Current server characteristics:
- optional dependency; base
codecloneinstall does not requiremcp - transports:
stdiostreamable-http
- run storage:
- in-memory only
- bounded history (
--history-limit, default4, maximum10) - latest-run pointer for
codeclone://latest/...resources - the
latestpointer moves whenever a neweranalyze_*call registers a run
- run identity:
- canonical run identity is derived from the canonical report integrity digest
- MCP payloads expose a short
run_idhandle (first 8 hex chars) - MCP tools/resources accept both short and full run ids
- MCP finding ids are compact by default and may lengthen when needed to stay unique within a run
- analysis modes:
fullclones_only
- process-count policy:
processesis an optional override- when omitted, MCP defers to the core CodeClone runtime
- root contract:
- analysis tools require an absolute repository root
- relative roots such as
.are rejected in MCP because server cwd may differ from the client workspace - granular
check_*tools may omitrootand use the latest compatible stored run; ifrootis provided, it must also be absolute
- cache policies:
reuseoffrefreshis rejected in MCP because the server is read-only.
- summary payload:
run_id,version,schema,modebaseline,metrics_baseline,cachecache.freshnessclassifies summary cache reuse asfresh,mixed, orreused- flattened
inventory(files,lines,functions,classes) - flattened
findings(total,new,known,by_family,production) - flattened
diff(new_clones,health_delta) warnings,failuresanalyze_changed_pathsis intentionally more compact thanget_run_summary: it returnschanged_files,health,health_delta,verdict,new_findings,resolved_findings, and an emptychanged_findingsplaceholder, while detailed changed payload stays inget_report_section(section="changed")
- workflow guidance:
- the MCP surface is intentionally agent-guiding rather than list-first
- the cheapest useful path is designed to be the most obvious path:
get_run_summary/get_production_triagefirst, thenlist_hotspotsorcheck_*, thenget_finding/get_remediation
- finding-list payloads:
- MCP finding ids are compact projection ids; canonical report ids are unchanged
detail_level="summary"is the default for list/check/hotspot toolsdetail_level="summary"keeps compact relative"path:line"locationsdetail_level="normal"keeps structured{path, line, end_line, symbol}locations plus remediationdetail_level="full"keeps the compatibility-oriented payload, includingpriority_factors,items, and per-locationuri
The MCP layer does not introduce a separate analysis engine. It calls the current CodeClone pipeline and reuses the canonical report document already produced by the report contract.
Tools¶
Current tool set:
| Tool | Key parameters | Purpose / notes |
|---|---|---|
analyze_repository |
absolute root, analysis_mode, changed_paths, git_diff_ref, inline thresholds, cache/baseline paths |
Run deterministic CodeClone analysis, register the latest run, and return a compact MCP summary. The intended next step is get_run_summary or get_production_triage, not broad listing by default |
analyze_changed_paths |
absolute root, changed_paths or git_diff_ref, analysis_mode, inline thresholds |
Diff-aware fast path: analyze a repo, attach a changed-files projection, and return a compact changed-files snapshot. The intended next step is get_report_section(section="changed") or get_production_triage |
get_run_summary |
run_id |
Return the stored summary for the latest or specified run, with slim inventory counts instead of the full file registry; this is the cheapest run-level snapshot and health becomes explicit available=false when metrics were skipped |
get_production_triage |
run_id, max_hotspots, max_suggestions |
Return a compact production-first MCP projection: health, cache freshness, production hotspots, production suggestions, and global source-kind counters. This is the default first-pass view for large or noisy repositories |
compare_runs |
run_id_before, run_id_after, focus |
Compare two registered runs by finding ids and run-to-run health delta; MCP returns short run ids, compact regression/improvement cards, mixed for conflicting signals, and incomparable with top-level reason, empty comparison cards, and health_delta=null when roots/settings differ |
evaluate_gates |
run_id, gate thresholds/booleans |
Evaluate CI/gating conditions against an existing run without exiting the process |
get_report_section |
run_id, section, family, path, offset, limit |
Return a canonical report section. Prefer targeted sections instead of section="all" unless the client truly needs the full canonical report. metrics is summary-only; metrics_detail is paginated/bounded and falls back to summary+hint when unfiltered |
list_findings |
family, category, severity, source_kind, novelty, sort_by, detail_level, changed_paths, git_diff_ref, exclude_reviewed, pagination |
Return deterministically ordered finding groups with filtering and pagination; compact summary detail is the default. Intended for broader filtered review after hotspots or check_*, not as the cheapest first-pass call |
get_finding |
finding_id, run_id, detail_level |
Return one finding by id; defaults to normal detail and accepts MCP short ids. Use this after list_hotspots, list_findings, or check_* instead of raising detail on larger lists |
get_remediation |
finding_id, run_id, detail_level |
Return just the remediation/explainability packet for one finding. Use this when the client needs the fix packet without pulling broader detail payloads |
list_hotspots |
kind, run_id, detail_level, changed_paths, git_diff_ref, exclude_reviewed, limit, max_results |
Return one derived hotlist (most_actionable, highest_spread, highest_priority, production_hotspots, test_fixture_hotspots) with compact summary cards. This is the preferred first-pass triage surface before broader list_findings calls |
check_clones |
run_id, root, path, clone_type, source_kind, max_results, detail_level |
Return clone findings from a compatible stored run; health.dimensions includes only clones. Prefer this narrower tool over list_findings when only clone debt is needed |
check_complexity |
run_id, root, path, min_complexity, max_results, detail_level |
Return complexity hotspots from a compatible stored run; health.dimensions includes only complexity. Prefer this narrower tool over list_findings when only complexity is needed |
check_coupling |
run_id, root, path, max_results, detail_level |
Return coupling hotspots from a compatible stored run; health.dimensions includes only coupling. Prefer this narrower tool over list_findings when only coupling is needed |
check_cohesion |
run_id, root, path, max_results, detail_level |
Return cohesion hotspots from a compatible stored run; health.dimensions includes only cohesion. Prefer this narrower tool over list_findings when only cohesion is needed |
check_dead_code |
run_id, root, path, min_severity, max_results, detail_level |
Return dead-code findings from a compatible stored run; health.dimensions includes only dead_code. Prefer this narrower tool over list_findings when only dead code is needed |
generate_pr_summary |
run_id, changed_paths, git_diff_ref, format |
Build a PR-friendly changed-files summary in markdown or JSON. Prefer markdown for compact LLM-facing output and reserve json for machine post-processing |
mark_finding_reviewed |
finding_id, run_id, note |
Mark a finding as reviewed in the in-memory MCP session |
list_reviewed_findings |
run_id |
Return the current reviewed findings for the selected run |
clear_session_runs |
none | Clear all stored in-memory runs plus ephemeral review/gate/session caches for the current server process |
All analysis/report tools are read-only with respect to repo state. The only
mutable MCP tools are mark_finding_reviewed and clear_session_runs, and
their effects are session-local and in-memory only. analyze_repository,
analyze_changed_paths, and evaluate_gates are
sessionful and may populate or reuse in-memory run state. The granular
check_* tools are read-only over stored runs: use analyze_repository or
analyze_changed_paths first, then query the latest run or pass a specific
run_id.
Budget-aware workflow is intentional:
- first pass:
get_run_summaryorget_production_triage - targeted triage:
list_hotspotsor the relevantcheck_* - single-finding drill-down:
get_finding, thenget_remediation - bounded metrics drill-down:
get_report_section(section="metrics_detail", family=..., limit=...) - PR output:
generate_pr_summary(format="markdown")unless machine JSON is explicitly needed
Resources¶
Current fixed resources:
| Resource | Payload | Availability |
|---|---|---|
codeclone://latest/summary |
latest run summary projection | always after at least one run |
codeclone://latest/triage |
latest production-first triage projection | always after at least one run |
codeclone://latest/report.json |
latest canonical report document | always after at least one run |
codeclone://latest/health |
latest health score + dimensions | always after at least one run |
codeclone://latest/gates |
latest gate evaluation result | only after evaluate_gates in current server process |
codeclone://latest/changed |
latest changed-files projection | only for a diff-aware latest run |
codeclone://schema |
schema-style descriptor for canonical report sections | always available |
Current run-scoped URI templates:
| URI template | Payload | Availability |
|---|---|---|
codeclone://runs/{run_id}/summary |
run-specific summary projection | for any stored run |
codeclone://runs/{run_id}/report.json |
run-specific canonical report | for any stored run |
codeclone://runs/{run_id}/findings/{finding_id} |
run-specific canonical finding group | for an existing finding in a stored run |
Fixed resources and URI templates are convenience views over already
registered runs. They do not trigger fresh analysis by themselves.
If a client needs the freshest truth, it must start a fresh analysis run first
(typically with cache_policy="off"), rather than relying on older session
state behind codeclone://latest/....
Contracts¶
- MCP is read-only:
- no source-file mutation
- no baseline update
- no metrics-baseline update
- no cache refresh writes
- Session review markers are ephemeral only:
- stored in memory per server process
- never written to baseline, cache, or report artifacts
streamable-httpdefaults to loopback binding. Non-loopback hosts require explicit--allow-remotebecause the server has no built-in authentication.- MCP must reuse current:
- pipeline stages
- baseline trust semantics
- cache semantics
- canonical report contract
- Inline MCP design-threshold parameters (
complexity_threshold,coupling_threshold,cohesion_threshold) define the canonical design finding universe of that run and are recorded inmeta.analysis_thresholds.design_findings. get_run_summaryis a deterministic convenience projection derived from the canonical report (meta,inventory,findings.summary,metrics.summary.health) plus baseline-diff/gate/changed-files context.get_production_triageis also a deterministic MCP projection over the same canonical run state (summary,derived.hotlists,derived.suggestions, and canonical finding source scope). It must not create a second analysis or remediation truth path.- Canonical JSON remains the source of truth for report semantics.
list_findingsandlist_hotspotsare deterministic projections over the canonical report, not a separate analysis branch.get_remediationis a deterministic MCP projection over existing suggestions/explainability data, not a second remediation engine.analysis_mode="clones_only"must mirror the same metric/dependency skip-semantics as the regular pipeline.- Missing optional MCP dependency is handled explicitly by the launcher with a
user-facing install hint and exit code
2.
Invariants (MUST)¶
- Tool names are stable public surface.
- Resource URI shapes are stable public surface.
- Read-only vs session-local tool annotations remain accurate.
analyze_repositoryalways registers exactly one latest run.analyze_changed_pathsrequireschanged_pathsorgit_diff_ref.analyze_repositoryandanalyze_changed_pathsrequire an absoluteroot; relative roots like.are rejected.changed_pathsis a structuredlist[str]of repo-relative paths, not a comma-separated string payload.analyze_changed_pathsmay return the samerun_idas a previous run when the canonical report digest is unchanged; changed-files state is an overlay, not a second canonical report.get_run_summarywith norun_idresolves to the latest stored run.codeclone://latest/...resources always resolve to the latest stored run in the current MCP server process, not to a globally fresh analysis state.- Summary-style MCP payloads expose
cache.freshnessas a derived convenience marker; canonical cache metadata remains available only through canonical report/meta surfaces. get_report_section(section="all")returns the full canonical report document.get_report_section(section="metrics")returns onlymetrics.summary.get_report_section(section="metrics_detail")is intentionally bounded: without filters it returnssummaryplus a hint; withfamilyand/orpathit returns a paginated item slice.get_report_section(section="changed")is available only for diff-aware runs.- MCP short
run_idvalues are session handles over the canonical digest of that run. - MCP summary/normal finding/location payloads use relative paths only and do
not expose absolute
file://URIs. - Finding
locationsandhtml_anchorvalues are stable projections over the current run and do not invent non-canonical ids. - For the same finding id,
source_kindremains consistent acrosslist_findings,list_hotspots, andget_finding. get_finding(detail_level="full")remains the compatibility-preserving full-detail endpoint:priority_factorsand locationuriare still available there.compare_runsis only semantically meaningful when both runs use comparable repository scope/root and analysis settings.compare_runsexposes top-levelcomparableplus optionalreason. When roots or effective analysis settings differ,regressionsandimprovementsbecome empty lists,unchangedandhealth_deltabecomenull, andverdictbecomesincomparable.compare_runs.health_deltaisafter.health - before.healthbetween the two selected comparable runs. It is independent of baseline or metrics-baseline drift.compare_runs.verdictis intentionally conservative but not one-dimensional: it returnsmixedwhen run-to-run finding deltas andhealth_deltadisagree.analysis_mode="clones_only"keeps clone findings fully usable, but MCP surfaces markhealthas unavailable instead of fabricating zeroed metrics.codeclone://latest/triageis a latest-only resource; run-specific triage is available via the tool, not via acodeclone://runs/{run_id}/...resource URI.
Failure modes¶
| Condition | Behavior |
|---|---|
mcp extra not installed |
codeclone-mcp prints install hint and exits 2 |
| Invalid root path / invalid numeric config | service raises contract error |
| Requested run missing | service raises run-not-found error |
| Requested finding missing | service raises finding-not-found error |
| Unsupported report section/resource suffix | service raises contract error |
Determinism / canonicalization¶
- MCP run identity is derived from canonical report integrity digest.
- Finding order is inherited from canonical report ordering.
- Hotlists are derived from canonical report data and deterministic derived ids.
- No MCP-only heuristics may change analysis or gating semantics.
- MCP must not re-synthesize design findings from raw metrics after the run; threshold-aware design findings belong to the canonical report document.
Locked by tests¶
tests/test_mcp_service.py::test_mcp_service_analyze_repository_registers_latest_runtests/test_mcp_service.py::test_mcp_service_lists_findings_and_hotspotstests/test_mcp_service.py::test_mcp_service_changed_runs_remediation_and_review_flowtests/test_mcp_service.py::test_mcp_service_granular_checks_pr_summary_and_resourcestests/test_mcp_service.py::test_mcp_service_evaluate_gates_on_existing_runtests/test_mcp_service.py::test_mcp_service_resources_expose_latest_summary_and_reporttests/test_mcp_server.py::test_mcp_server_exposes_expected_read_only_toolstests/test_mcp_server.py::test_mcp_server_tool_roundtrip_and_resourcestests/test_mcp_server.py::test_mcp_server_main_reports_missing_optional_dependency
Non-guarantees¶
- There is currently no standalone
mcp_api_versionconstant. - In-memory run history does not survive process restart.
clear_session_runsresets the in-memory run registry and related session caches, but does not mutate baseline/cache/report artifacts on disk.- Client-specific UI/approval behavior is not part of the CodeClone contract.