Skip to content

Bootstrap and config

Bootstrap: init, MCP sync, and refresh

The memory store can be created or refreshed through CLI init, MCP auto-sync (default), or explicit MCP refresh. All paths call the same deterministic ingest pipeline (run_memory_init).

CLI init (human / CI)

codeclone memory init --root /abs/repo
codeclone memory init --root /abs/repo --refresh   # re-ingest + staleness pass
sequenceDiagram
    participant H as Human / CI
    participant CLI as codeclone memory init
    participant CC as CodeClone analysis
    participant DB as SQLite store
    H ->> CLI: init [--refresh]
    CLI ->> CC: load cached report or analyze
    CLI ->> CLI: build ingest batch
    Note over CLI: modules, contracts, docs,<br/>tests, risks, git hotspots
    CLI ->> DB: upsert records + evidence
    CLI ->> DB: rebuild FTS index
    opt --refresh
        CLI ->> DB: mark drifted records stale
    end
    CLI ->> H: status summary

MCP sync (default agent path)

Policy key: mcp_sync_policy in [tool.codeclone.memory] (default bootstrap_if_missing).

Policy Auto behavior on get_relevant_memory Explicit refresh_from_run
off No auto sync; DB must exist Always runs ingest
bootstrap_if_missing Create store from latest MCP run when DB missing Always runs ingest
refresh_when_stale Re-ingest when stored digest ≠ current run digest Always runs ingest
sequenceDiagram
    participant A as Agent
    participant M as MCP
    participant S as mcp_sync
    participant DB as SQLite store
    A ->> M: analyze_repository
    M -->> A: run_id
    A ->> M: start_controlled_change
    M -->> A: edit_allowed=true
    A ->> M: get_relevant_memory(root, intent_id)
    M ->> S: decide + execute (policy)
    alt missing DB + bootstrap_if_missing
        S ->> DB: init ingest from run report
        S -->> M: memory_sync completed
    else digest changed + refresh_when_stale
        S ->> DB: refresh ingest + staleness
        S -->> M: memory_sync completed
    else unchanged
        S -->> M: skip (no memory_sync field)
    end
    M ->> DB: ranked scope query
    M -->> A: records + optional memory_sync

Explicit refresh: manage_engineering_memory(action="refresh_from_run", run_id?) always ingests from the selected MCP run (defaults to latest). Use after analyze_repository when you need fresh system facts without waiting for policy triggers.

Agent rule: MCP sync ingests system records only — same as CLI init. Human approve is still required for agent drafts. MCP never runs approve/reject/archive.

When auto-sync does not run and the DB is missing, memory tools return a contract error pointing to refresh_from_run or CLI init.

Ingest sources (non-exhaustive):

Record type Typical ingest source
module_role Report file inventory
contract_note contracts/__init__.py paths (auto or configured)
document_link Configured docs and/or docs/**/*.md from inventory
test_anchor Test file inventory
risk_note Complexity / security surfaces from metrics
public_surface MCP / CLI public API inventory
contradiction_note Optional MCP tool-count doc vs snapshot

Git provenance (Phase 18.6): init attaches git_commit evidence when git is available; optional git hotspot records use git_hotspot_period_days / git_hotspot_min_changes from config.

Refs: codeclone/memory/ingest/mcp_sync.py, codeclone/surfaces/mcp/_session_memory_mixin.py.


Configuration

Nested tables in pyproject.toml under [tool.codeclone.memory], [tool.codeclone.memory.ingest], and [tool.codeclone.memory.semantic]. Defaults live in codeclone/config/memory_defaults.py; key validation in codeclone/config/memory_specs.py (flat memory keys) and codeclone/config/memory.py (IngestConfig, SemanticConfig).

Retention and capacity

Key Type Default Purpose
active_retention_days int -1 Active record retention (-1 = no age purge)
stale_retention_days int 180 Stale record retention before vacuum
draft_retention_days int 14 Draft candidate retention
rejected_retention_days int 30 Rejected draft retention
archived_retention_days int 365 Archived record retention
receipt_retention_days int 90 Finish-receipt evidence retention
max_records int 10000 Hard cap on persisted records
max_candidates int 1000 Draft inbox capacity
max_evidence_per_record int 20 Evidence rows per record
max_statement_chars int 1000 Statement hard limit (target 300, soft warn 500)
max_blast_radius_cache_entries int 500 Cached blast-radius projections per project
trajectory_retention_days int 365 Stored trajectory projection retention

Store backend and sync

Key Type Default Purpose
backend str sqlite Persistence backend
db_path str .codeclone/memory/engineering_memory.sqlite3 SQLite path
mcp_sync_policy str bootstrap_if_missing off | bootstrap_if_missing | refresh_when_stale

Git hotspots (init ingest)

Key Type Default Purpose
git_hotspot_period_days int 90 Git history window for hotspot records
git_hotspot_min_changes int 5 Minimum commits to emit a hotspot

Trajectory projection and export

Key Type Default Purpose
trajectories_enabled bool true Enable trajectory projection from audit core
trajectory_export_enabled bool false Gate CLI trajectory export
trajectory_export_include_payloads bool false Include step payloads in JSONL export
trajectory_export_max_record_bytes int 65536 Per-record export size cap
trajectory_export_max_file_bytes int 10485760 Export file size cap

Projection rebuild coalesce

Key Type Default Purpose
projection_rebuild_policy str off off | enqueue_when_stale — finish may enqueue jobs
projection_rebuild_running_timeout_seconds int 1800 Stale running-job reclaim timeout
projection_rebuild_spawn_worker bool true Spawn detached worker on enqueue
projection_rebuild_coalesce_window_seconds int 60 Batch sub-threshold rebuilds (0 = immediate spawn)
projection_rebuild_coalesce_min_delta int 25 Active-record delta bypassing coalesce window

Ingest paths ([tool.codeclone.memory.ingest])

Key Type Default Purpose
contract_constants_paths string list [] Contract version files; empty uses auto discovery under codeclone/contracts/
document_link_paths string list [] Doc paths; empty uses README, AGENTS, CLAUDE, and docs tree
mcp_tool_schema_snapshot_path string or null null MCP tool schema snapshot for contradiction checks
mcp_tool_count_doc_paths string list [] Docs claiming MCP tool counts (requires snapshot path)

Semantic batching ([tool.codeclone.memory.semantic])

Key Type Default Purpose
enabled bool false Opt-in semantic sidecar
backend str lancedb Vector backend
index_path str .codeclone/memory/semantic_index.lance LanceDB path
embedding_provider str diagnostic diagnostic | fastembed | local_model | api
embedding_model str provider default e.g. BAAI/bge-small-en-v1.5 for fastembed
embedding_cache_dir str .codeclone/memory/fastembed Model cache directory
allow_model_download bool false Permit fastembed downloads
dimension int 256 Diagnostic provider dimension
max_results int 20 Semantic search cap
index_audit bool true Project audit summaries into index
embed_max_documents_per_batch int 64 Embedding batch document cap
embed_max_padded_tokens_per_batch int 8192 Embedding batch token budget
projection_token_estimator str chars_approx chars_approx | tiktoken

Environment overrides for memory and semantic fields: 10-config Environment variable overrides (Engineering Memory table).

Unknown keys under [tool.codeclone.memory.semantic] are contract errors (Pydantic extra="forbid" on SemanticConfig).

Refs:

  • codeclone/config/memory_specs.py
  • codeclone/config/memory_defaults.py
  • codeclone/config/memory.py