Engineering Memory¶

Purpose¶

Engineering Memory is a local, evidence-linked knowledge store for a Python repository. It captures structural facts, document links, git provenance, and governed human/agent notes — then surfaces them to AI agents before and during controlled edits.

Not a second analyzer

Memory reads from the same canonical report, contracts, docs, tests, and git facts as CodeClone analysis. It does not run a separate LLM inference path, mutate source files, or override structural findings.

Not analysis cache

The SQLite database under .codeclone/memory/ is a governed memory contract, separate from analysis cache (cache.json) and baselines (codeclone.baseline.json).

Status¶

Phase	Capability	Surface
18.1	Store, init ingest, CLI `init\\|status\\|for-path\\|search`	CLI
18.2	Scoped retrieval, ranking	MCP `get_relevant_memory`, `query_engineering_memory`
18.3	Refresh staleness, scope staleness, retention	CLI `stale`, `vacuum`; finish hook marks scope stale
18.4	Draft governance, claim validation	MCP `manage_engineering_memory`; CLI `review-candidates\\|approve\\|reject\\|archive`
18.5	Scope coverage, finish proposals	`finish_controlled_change(propose_memory=true)`
18.6	FTS search (`match_mode`), git hotspots, schema 1.1, Rich CLI	CLI `--match`; MCP `filters.match_mode`
18.7	MCP sync from analysis runs	`mcp_sync_policy`; auto bootstrap on `get_relevant_memory`; `refresh_from_run`
20	Optional semantic retrieval (LanceDB sidecar)	`[tool.codeclone.memory.semantic]`; CLI `memory semantic *`; MCP/CLI search `--semantic`
22	Audit event core for trajectory replay	`AUDIT_EVENT_CORE_VERSION`; audit `event_core_json` / `workflow_id`
23	Trajectory projection + SQLite storage	CLI `memory trajectory status\\|rebuild\\|list\\|show\\|search`
24	Scoped trajectory retrieval + memory evidence	MCP `get_relevant_memory.trajectories[]`; `query_engineering_memory(mode=trajectory_*)`
25	Disabled-by-default local JSONL export profiles	CLI `memory trajectory export --profile ... --out ...`
26	Patch Trail persistence + scoped retrieval	`memory_trajectory_patch_trails`; `patch_trail_summary` on scoped retrieval
28	Incremental projection jobs	Watermarked trajectory rebuild, semantic hash-skip, coalesced worker
Live	Trajectory quality and passport analytics	Quality/complexity contract, anomalies, agents, dashboard
Live	Experience Layer	Distillation job, scoped `experiences[]`, `promote_experience` draft bridge

Schema version constant: ENGINEERING_MEMORY_SCHEMA_VERSION in codeclone/contracts/__init__.py (currently 1.7).

Semantic index format (separate contract): SEMANTIC_INDEX_FORMAT_VERSION (currently 2) in the same module. The vector sidecar is independent of the SQLite memory schema version.

Architecture¶

graph TB
    subgraph Sources["Deterministic sources"]
        CR[Canonical Report]
        CT[Contracts / docs / tests]
        GIT[Git provenance]
        RC[Finish receipts / audit]
    end

    subgraph MemoryStore["Engineering Memory (SQLite)"]
        REC[memory_records]
        SUB[memory_subjects]
        EV[memory_evidence]
        FTS[memory_fts FTS5]
        TRAJ[trajectory projection]
        EXP[Experience projection]
    end

    subgraph Surfaces["Read / write surfaces"]
        CLI["codeclone memory *"]
        MCP_R["MCP read tools"]
        MCP_W["MCP draft writes"]
        HUM["Human approve CLI"]
    end

    CR -->|init / refresh ingest| MemoryStore
    CT -->|init / refresh ingest| MemoryStore
    GIT -->|init / refresh ingest| MemoryStore
    RC -->|propose_from_receipt / finish hook| MemoryStore
    RC --> TRAJ --> EXP
    MemoryStore --> CLI
    MemoryStore --> MCP_R
    MCP_W -->|draft only| MemoryStore
    HUM -->|approve / reject / archive| MemoryStore
    style MemoryStore stroke: #6366f1, stroke-width: 2px
    style MCP_W fill: #fef9c3
    style HUM fill: #dcfce7

Module ownership:

Module	Role
`codeclone/memory/sqlite_store.py`	SQLite persistence, FTS sync, subject dedup
`codeclone/memory/ingest/*`	Init/refresh batch builders from report + git + docs
`codeclone/memory/retrieval/*`	Scoped ranking and query router
`codeclone/memory/semantic/*`	Projections, LanceDB sidecar, rebuild, search hits
`codeclone/memory/embedding/*`	Embedding providers (`diagnostic` default)
`codeclone/memory/governance.py`	Draft candidates, approve/reject, claim validation
`codeclone/memory/staleness.py`	Refresh-time and scope-time staleness
`codeclone/memory/jobs/store.py`	Coalesced projection rebuild jobs (schema 1.3+)
`codeclone/memory/trajectory/*`	Audit → trajectory projection, Patch Trail, export
`codeclone/memory/experience/*`	Deterministic Experience distillation + persistence
`codeclone/config/memory*.py`	`[tool.codeclone.memory]` resolution
`codeclone/surfaces/cli/memory*.py`	Human CLI + Rich rendering
`codeclone/surfaces/mcp/_session_memory_mixin.py`	MCP memory tools + finish hook

Refs:

codeclone/memory/ingest/runner.py:run_memory_init
codeclone/memory/retrieval/service.py:query_engineering_memory
codeclone/surfaces/mcp/_session_memory_mixin.py

Normative detail:

Regressions and UX fixes (2.1.0a1)¶

These are documentation anchors for shipped fixes — see CHANGELOG.md Fixed for the full controller list.

Area	Symptom	Fix (code truth)
VS Code session/audit webviews	Payload footprint table showed zeros for workflow metrics	Audit footprint JSON uses `calls` and `tokens` in `top_workflows`; the webview maps both legacy and mistaken field names (`workspaceInsightsRenderer.js`).
CLI session stats	Import / duplication issues	Collection lives in `codeclone/controller_insights/`; CLI renders only (`surfaces/cli/session_stats.py`).
MCP vs CLI insights	Session stats logic must not live only in MCP	IDE-only tools `get_workspace_session_stats` / `get_controller_audit_trail` share the same collectors as `--session-stats` / `--audit`.
Patch verify	Identical before/after run accepted	`after_run_not_new` for `python_structural` and `governance_config` profiles.
Finish hygiene	Over-blocking on foreign out-of-scope dirt	Unattributed out-of-scope dirt is advisory; blocking reasons are `missing_evidence` and `foreign_dirty_overlap`.