Bootstrap and config
Bootstrap: init, MCP sync, and refresh¶
The memory store can be created or refreshed through CLI init, MCP auto-sync
(default), or explicit MCP refresh. All paths call the same deterministic
ingest pipeline (run_memory_init).
CLI init (human / CI)¶
codeclone memory init --root /abs/repo
codeclone memory init --root /abs/repo --refresh # re-ingest + staleness pass
sequenceDiagram
participant H as Human / CI
participant CLI as codeclone memory init
participant CC as CodeClone analysis
participant DB as SQLite store
H ->> CLI: init [--refresh]
CLI ->> CC: load cached report or analyze
CLI ->> CLI: build ingest batch
Note over CLI: modules, contracts, docs,<br/>tests, risks, git hotspots
CLI ->> DB: upsert records + evidence
CLI ->> DB: rebuild FTS index
opt --refresh
CLI ->> DB: mark drifted records stale
end
CLI ->> H: status summary
MCP sync (default agent path)¶
Policy key: mcp_sync_policy in [tool.codeclone.memory] (default
bootstrap_if_missing).
| Policy | Auto behavior on get_relevant_memory |
Explicit refresh_from_run |
|---|---|---|
off |
No auto sync; DB must exist | Always runs ingest |
bootstrap_if_missing |
Create store from latest MCP run when DB missing | Always runs ingest |
refresh_when_stale |
Re-ingest when stored digest ≠ current run digest | Always runs ingest |
sequenceDiagram
participant A as Agent
participant M as MCP
participant S as mcp_sync
participant DB as SQLite store
A ->> M: analyze_repository
M -->> A: run_id
A ->> M: start_controlled_change
M -->> A: edit_allowed=true
A ->> M: get_relevant_memory(root, intent_id)
M ->> S: decide + execute (policy)
alt missing DB + bootstrap_if_missing
S ->> DB: init ingest from run report
S -->> M: memory_sync completed
else digest changed + refresh_when_stale
S ->> DB: refresh ingest + staleness
S -->> M: memory_sync completed
else unchanged
S -->> M: skip (no memory_sync field)
end
M ->> DB: ranked scope query
M -->> A: records + optional memory_sync
Explicit refresh: manage_engineering_memory(action="refresh_from_run", run_id?)
always ingests from the selected MCP run (defaults to latest). Use after
analyze_repository when you need fresh system facts without waiting for policy
triggers.
Agent rule: MCP sync ingests system records only — same as CLI init.
Human approve is still required for agent drafts. MCP never runs
approve/reject/archive.
When auto-sync does not run and the DB is missing, memory tools return a contract
error pointing to refresh_from_run or CLI init.
Ingest sources (non-exhaustive):
| Record type | Typical ingest source |
|---|---|
module_role |
Report file inventory |
contract_note |
contracts/__init__.py paths (auto or configured) |
document_link |
Configured docs and/or docs/**/*.md from inventory |
test_anchor |
Test file inventory |
risk_note |
Complexity / security surfaces from metrics |
public_surface |
MCP / CLI public API inventory |
contradiction_note |
Optional MCP tool-count doc vs snapshot |
Git provenance (Phase 18.6): init attaches git_commit evidence when git is
available; optional git hotspot records use
git_hotspot_period_days / git_hotspot_min_changes from config.
Refs: codeclone/memory/ingest/mcp_sync.py, codeclone/surfaces/mcp/_session_memory_mixin.py.
Configuration¶
Nested tables in pyproject.toml under [tool.codeclone.memory],
[tool.codeclone.memory.ingest], and [tool.codeclone.memory.semantic].
Defaults live in codeclone/config/memory_defaults.py; key validation in
codeclone/config/memory_specs.py (flat memory keys) and
codeclone/config/memory.py (IngestConfig, SemanticConfig).
Retention and capacity¶
| Key | Type | Default | Purpose |
|---|---|---|---|
active_retention_days |
int | -1 |
Active record retention (-1 = no age purge) |
stale_retention_days |
int | 180 |
Stale record retention before vacuum |
draft_retention_days |
int | 14 |
Draft candidate retention |
rejected_retention_days |
int | 30 |
Rejected draft retention |
archived_retention_days |
int | 365 |
Archived record retention |
receipt_retention_days |
int | 90 |
Finish-receipt evidence retention |
max_records |
int | 10000 |
Hard cap on persisted records |
max_candidates |
int | 1000 |
Draft inbox capacity |
max_evidence_per_record |
int | 20 |
Evidence rows per record |
max_statement_chars |
int | 1000 |
Statement hard limit (target 300, soft warn 500) |
max_blast_radius_cache_entries |
int | 500 |
Cached blast-radius projections per project |
trajectory_retention_days |
int | 365 |
Stored trajectory projection retention |
Store backend and sync¶
| Key | Type | Default | Purpose |
|---|---|---|---|
backend |
str | sqlite |
Persistence backend |
db_path |
str | .codeclone/memory/engineering_memory.sqlite3 |
SQLite path |
mcp_sync_policy |
str | bootstrap_if_missing |
off | bootstrap_if_missing | refresh_when_stale |
Git hotspots (init ingest)¶
| Key | Type | Default | Purpose |
|---|---|---|---|
git_hotspot_period_days |
int | 90 |
Git history window for hotspot records |
git_hotspot_min_changes |
int | 5 |
Minimum commits to emit a hotspot |
Trajectory projection and export¶
| Key | Type | Default | Purpose |
|---|---|---|---|
trajectories_enabled |
bool | true |
Enable trajectory projection from audit core |
trajectory_export_enabled |
bool | false |
Gate CLI trajectory export |
trajectory_export_include_payloads |
bool | false |
Include step payloads in JSONL export |
trajectory_export_max_record_bytes |
int | 65536 |
Per-record export size cap |
trajectory_export_max_file_bytes |
int | 10485760 |
Export file size cap |
Projection rebuild coalesce¶
| Key | Type | Default | Purpose |
|---|---|---|---|
projection_rebuild_policy |
str | off |
off | enqueue_when_stale — finish may enqueue jobs |
projection_rebuild_running_timeout_seconds |
int | 1800 |
Stale running-job reclaim timeout |
projection_rebuild_spawn_worker |
bool | true |
Spawn detached worker on enqueue |
projection_rebuild_coalesce_window_seconds |
int | 60 |
Batch sub-threshold rebuilds (0 = immediate spawn) |
projection_rebuild_coalesce_min_delta |
int | 25 |
Active-record delta bypassing coalesce window |
Ingest paths ([tool.codeclone.memory.ingest])¶
| Key | Type | Default | Purpose |
|---|---|---|---|
contract_constants_paths |
string list | [] |
Contract version files; empty uses auto discovery under codeclone/contracts/ |
document_link_paths |
string list | [] |
Doc paths; empty uses README, AGENTS, CLAUDE, and docs tree |
mcp_tool_schema_snapshot_path |
string or null | null |
MCP tool schema snapshot for contradiction checks |
mcp_tool_count_doc_paths |
string list | [] |
Docs claiming MCP tool counts (requires snapshot path) |
Semantic batching ([tool.codeclone.memory.semantic])¶
| Key | Type | Default | Purpose |
|---|---|---|---|
enabled |
bool | false |
Opt-in semantic sidecar |
backend |
str | lancedb |
Vector backend |
index_path |
str | .codeclone/memory/semantic_index.lance |
LanceDB path |
embedding_provider |
str | diagnostic |
diagnostic | fastembed | local_model | api |
embedding_model |
str | provider default | e.g. BAAI/bge-small-en-v1.5 for fastembed |
embedding_cache_dir |
str | .codeclone/memory/fastembed |
Model cache directory |
allow_model_download |
bool | false |
Permit fastembed downloads |
dimension |
int | 256 |
Diagnostic provider dimension |
max_results |
int | 20 |
Semantic search cap |
index_audit |
bool | true |
Project audit summaries into index |
embed_max_documents_per_batch |
int | 64 |
Embedding batch document cap |
embed_max_padded_tokens_per_batch |
int | 8192 |
Embedding batch token budget |
projection_token_estimator |
str | chars_approx |
chars_approx | tiktoken |
Environment overrides for memory and semantic fields: 10-config Environment variable overrides (Engineering Memory table).
Unknown keys under [tool.codeclone.memory.semantic] are contract errors
(Pydantic extra="forbid" on SemanticConfig).
Refs:
codeclone/config/memory_specs.pycodeclone/config/memory_defaults.pycodeclone/config/memory.py