Skip to content

21. Security Model

Purpose

Describe implemented protections and explicit security boundaries.

Public surface

  • Scanner path validation: codeclone/scanner/__init__.py:iter_py_files
  • File read and parser limits: codeclone/core/worker.py:process_file, codeclone/analysis/parser.py:_parse_limits
  • Baseline/cache validation: codeclone/baseline/*, codeclone/cache/*
  • HTML escaping: codeclone/report/html/primitives/escape.py, codeclone/report/html/assemble.py
  • MCP read-only enforcement: codeclone/surfaces/mcp/*
  • Repository path containment: codeclone/utils/repo_paths.py

Data model

Security-relevant input classes:

  • filesystem paths (root/source/baseline/cache/report)
  • untrusted JSON files (baseline/cache)
  • untrusted source snippets and metadata rendered into HTML
  • MCP request parameters (root, filters, diff refs, cache policy)

Contracts

  • CodeClone parses source text; it does not execute repository Python code.
  • Sensitive root directories are blocked by scanner policy.
  • Symlink traversal outside the root is skipped.
  • HTML escapes text and attribute contexts before embedding.
  • MCP is read-only with respect to source files, baselines, analysis cache (cache.json), and canonical report artifacts.
  • Allowed repo-local writes are explicit and isolated: ephemeral controller coordination (file backend under .codeclone/intents/ or SQLite under .codeclone/db/intents.sqlite3), optional controller audit (.codeclone/db/audit.sqlite3), Engineering Memory/projection state under .codeclone/memory/, and opt-in Platform Observability (.codeclone/db/platform_observability.sqlite3).
  • Platform Observability stores bounded metadata and literal-free SQL fingerprints, never raw payload bodies, and cannot affect analysis truth, gates, baselines, memory facts, or edit authorization.
  • Session-local review markers and in-memory run history do not survive process restart.
  • Five session/coordination tools are marked destructiveHint in MCP metadata (manage_change_intent, start_controlled_change, finish_controlled_change, mark_finding_reviewed, clear_session_runs).
  • --allow-remote is required for non-loopback HTTP bind. It is an explicit operator opt-in, not a substitute for authentication. For streamable-http, CODECLONE_MCP_AUTH_TOKEN is mandatory at server start (see Remote MCP transport). stdio transport remains a local-trust surface on the host.
  • MCP accepts cache policies reuse and off; refresh is rejected at runtime with a contract error.
  • git_diff_ref is validated as a safe single revision expression before any git diff subprocess call.
  • MCP processes is capped to min(requested, os.cpu_count() or 4, 64). This is a resource ceiling only; it does not change analysis results.

Trust boundaries (explicit)

These are documented limits, not hidden guarantees.

Repository path containment

resolve_under_repo_root in codeclone/utils/repo_paths.py is the shared resolver for audit paths, intent-registry DB paths, memory config paths, MCP optional artifacts, and cache wire filepath projection. By default paths must stay under the analysis root after normalization; symlink escapes outside the root are rejected.

Refs:

  • codeclone/utils/repo_paths.py
  • tests/test_repo_paths.py

MCP optional artifact paths

baseline_path, metrics_baseline_path, cache_path, and coverage_xml on analyze_repository / analyze_changed_paths resolve through the same helper. Default: repo-relative only; absolute or out-of-repo paths are rejected. Opt-in: set allow_external_artifacts=true on the analysis tool call when shared monorepo artifacts live outside the scan root (privileged input).

Parameter details: 25-mcp-interface/index.md. Tool copy: help(topic="trust_boundaries").

Refs:

  • codeclone/surfaces/mcp/_session_helpers.py:_resolve_optional_path

Cache checksum semantics

Cache signatures detect corruption and accidental mutation of the canonical cache payload. They are not adversarial authentication against a privileged local attacker who can rewrite .codeclone/cache.json directly.

Refs:

  • codeclone/cache/integrity.py:sign_cache_payload
  • codeclone/cache/integrity.py:verify_cache_payload_signature

Workspace change intents

The workspace intent registry coordinates concurrent edits between processes running as the same local UID on the same host (file backend: .codeclone/intents/; SQLite backend: .codeclone/db/intents.sqlite3 when configured). Records are advisory, TTL-bound (default 1 hour, lease 5 minutes), gitignored, and integrity-checked (SHA-256 over canonical JSON) but not cryptographically authenticated. A same-UID process with repository write access can forge or delete intent records; that UID can already modify source files and baselines directly. Treat intents as coordination hints, not proof of agent identity.

The Cursor plugin may enforce preToolUse by reading this registry through codeclone.workspace_intent (read-only; no lazy-close or writes). The hook gate authorizes edits only for own active or foreign active intents (not stale/queued). That reduces accidental edits without intent; it does not stop a hostile same-UID process.

Refs:

  • codeclone/workspace_intent/gate.py
  • codeclone/surfaces/mcp/_workspace_intents.py
  • codeclone/surfaces/mcp/_session_workflow_mixin.py

Remote MCP transport

Loopback binding is the default. --allow-remote removes the loopback-only transport guard so HTTP MCP can bind on non-local interfaces.

For every streamable-http start (loopback or remote), set CODECLONE_MCP_AUTH_TOKEN to a secret of at least 32 characters. The launcher refuses to bind HTTP transport when the variable is missing or too short; there is no unauthenticated HTTP fallback. Clients must send Authorization: Bearer …; the server validates with hmac.compare_digest (stdlib only). CodeClone does not ship TLS or multi-tenant session management — use a reverse proxy when exposing beyond loopback.

Variable semantics and precedence: 10-config Environment variable overrides.

Refs:

  • codeclone/surfaces/mcp/auth.py
  • codeclone/surfaces/mcp/server.py
  • tests/test_mcp_http_auth.py
  • tests/test_mcp_server.py::test_mcp_server_main_rejects_non_loopback_host_without_opt_in

Platform Observability

The observer is an optional local diagnostics boundary. Its CLI and MCP readers open the telemetry store read-only; the instrumentation writer commits one completed operation and its spans atomically. No network exporter is provided.

The MCP slicer is bounded and declares that its output is CodeClone-development telemetry, not repository quality evidence. See 26-platform-observability.md.

Refs:

  • codeclone/analysis/parser.py:_parse_with_limits
  • codeclone/scanner/__init__.py:SENSITIVE_DIRS
  • codeclone/scanner/__init__.py:iter_py_files
  • codeclone/report/html/primitives/escape.py:_escape_html

Invariants (MUST)

  • Baseline and cache integrity checks use constant-time comparison.
  • Size guards are enforced before parsing baseline/cache JSON.
  • Cache failures degrade safely; baseline trust failures follow the explicit trust model.

Refs:

  • codeclone/baseline/clone_baseline.py:Baseline.verify_integrity
  • codeclone/cache/store.py:Cache.load
  • codeclone/surfaces/cli/workflow.py:_main_impl

Failure modes

Condition Security behavior
Symlink points outside root File skipped
Root under sensitive dirs Validation error
Oversized baseline Baseline rejected
Oversized cache Cache ignored
HTML-injected payload in metadata/source Escaped output
--allow-remote not passed for HTTP Transport rejected
Invalid cache_policy requested in MCP Policy rejected
git_diff_ref fails validation Parameter rejected

Determinism / canonicalization

  • Canonical JSON hashing for baseline/cache prevents formatting-only drift.
  • Security failures map to explicit statuses rather than silent mutation.

Refs:

  • codeclone/baseline/trust.py:_compute_payload_sha256
  • codeclone/cache/integrity.py:canonical_json
  • codeclone/baseline/trust.py:BaselineStatus
  • codeclone/cache/versioning.py:CacheStatus

Locked by tests

  • tests/test_security.py::test_scanner_path_traversal
  • tests/test_scanner_extra.py::test_iter_py_files_symlink_loop_does_not_traverse
  • tests/test_security.py::test_html_report_escapes_user_content
  • tests/test_html_report.py::test_html_report_escapes_script_breakout_payload
  • tests/test_cache.py::test_cache_too_large_warns
  • tests/test_mcp_service.py::test_mcp_service_rejects_refresh_cache_policy_in_read_only_mode
  • tests/test_mcp_service.py::test_mcp_service_caps_process_count_from_request_and_config
  • tests/test_mcp_server.py::test_mcp_server_main_rejects_non_loopback_host_without_opt_in
  • tests/test_repo_paths.py
  • tests/test_mcp_http_auth.py
  • tests/test_security_invariants.py

Non-guarantees

  • Baseline/cache integrity is tamper-evident at file-content level; it is not cryptographic attestation against a privileged attacker.
  • Baseline payload_sha256 and cache signatures protect against accidental corruption and unsynchronized edits; they do not authenticate files against a hostile same-UID writer.
  • Workspace intent files are not signed and must not be treated as proof of which agent declared a change.
  • MCP optional artifact paths outside the scan root require explicit allow_external_artifacts=true; default resolution stays under the repo root.
  • Remote MCP without the auth token env var is not authenticated; with --allow-remote it is not a hardened multi-tenant network service.