Optional semantic retrieval (Phase 20)¶

Semantic search is opt-in and off by default (enabled = false in codeclone/config/memory_defaults.py). It does not replace FTS: keyword search still runs first; when the index is available, vector proximity merges extra candidates and adjusts ranking (semantic_proximity * 0.3 in codeclone/memory/retrieval/ranking.py).

flowchart LR
    Q[Query text] --> FTS[FTS5 memory_fts]
    Q --> EMB[EmbeddingProvider.embed_query]
    DOCS[Index projections] --> DOCEMB[EmbeddingProvider.embed_documents]
    EMB --> VEC[LanceDB k-NN]
    DOCEMB --> INDEX[LanceDB rebuild]
    INDEX --> VEC
    FTS --> MERGE[Candidate union]
    VEC --> MERGE
    MERGE --> RANK[Rank + semantic weight]
    RANK --> OUT[search payload + semantic block]

Prerequisites (all required for semantic.used: true):

memory.semantic.enabled = true in effective config.
Optional vector backend installed: pip install 'codeclone[semantic-lancedb]'. For semantic-quality local embeddings, install codeclone[semantic-local] instead (or combine semantic-lancedb + semantic-fastembed).
Index built at index_path (default .codeclone/memory/semantic_index.lance) via manage_engineering_memory(action="rebuild_semantic_index") (MCP agents) or codeclone memory semantic rebuild (CLI/CI).

Minimal local semantic-quality setup:

pip install 'codeclone[semantic-local]'

[tool.codeclone.memory.semantic]
enabled = true
embedding_provider = "fastembed"
allow_model_download = true  # or pre-populate embedding_cache_dir and keep false

codeclone memory init --root .
# Agents (MCP): manage_engineering_memory(action=rebuild_semantic_index)
codeclone memory semantic rebuild --root .
codeclone memory semantic search "recover after MCP restart" --root .
codeclone memory search "recover after MCP restart" --semantic --root .

Use codeclone[semantic-lancedb] only when you intentionally want the derived sidecar with the deterministic diagnostic provider; it is stable, but not semantic-quality recall.

Degraded states (never crash read paths):

Condition	Index behavior	Search `semantic` block
`enabled=false`	`NullSemanticIndex`	`used: false`, `reason: disabled`
Enabled, index missing	`UnavailableSemanticIndex` (`not_built`)	FTS only; `used: false`
Enabled, LanceDB extra missing	`UnavailableSemanticIndex` (`lancedb_not_installed`)	FTS only; explicit `semantic rebuild` fails clear
Provider unavailable	`semantic_reason` set (e.g. FastEmbed extra/model unavailable)	FTS only

The index is a derived, rebuildable sidecar — not updated on the memory write hot path. Rebuild is idempotent on projection text_hash (codeclone/memory/semantic/rebuild.py).

Embedding providers¶

Provider	Status	Meaning
`diagnostic` (default)	Always available	`DeterministicHashEmbeddingProvider`: sha256-derived unit vectors. Stable across runs, not semantic-quality recall. CLI prints an advisory when `provider=diagnostic`.
`fastembed`	Optional: `codeclone[semantic-fastembed]`	Local ONNX embeddings through FastEmbed. Default model is `BAAI/bge-small-en-v1.5` (`384` dimensions). Query text uses a `query:` prefix; indexed records use `passage:`. Model download is disabled unless `allow_model_download=true`, so air-gapped installs can pre-populate `embedding_cache_dir`.
`local_model`	Raises `MemorySemanticUnavailableError`	Reserved compatibility literal; use `fastembed` for community local semantic search.
`api`	Raises `MemorySemanticUnavailableError`	Reserved for remote/API providers.

Model id for diagnostic: diagnostic-hash-v1 (codeclone/memory/embedding/__init__.py). Model id for FastEmbed: fastembed:<embedding_model>.

What gets indexed¶

Memory record types (INDEXED_MEMORY_TYPES in codeclone/memory/semantic/projection.py):

contract_note, change_rationale, risk_note, architecture_decision, contradiction_note, protocol_rule, human_note.

Not semantically indexed (served by exact subject / path match instead): module_role, test_anchor, document_link, public_surface, stale_marker.

Audit incidents when index_audit=true (default) and audit_enabled=true with a readable audit DB — projected from controller_events.summary only (never payload_json). Event types:

intent.declared, patch_contract.violated, workspace.conflict_detected, baseline_abuse.detected, claim_validation.violated, review_receipt.created.

Empty audit summaries are skipped.

Trajectory passports when trajectories are enabled — projected via project_trajectory() from bounded trajectory fields (summary, outcome, subjects; see codeclone/memory/semantic/projection.py). Long texts become multiple index units under format 2 (below).

Surfaces¶

Surface	Semantic flag
`query_engineering_memory(mode=search, semantic=true)`	MCP
`manage_engineering_memory(action=rebuild_semantic_index)`	MCP (build sidecar)
`codeclone memory search --semantic`	CLI
`codeclone memory semantic search`	CLI (requires built index)
`codeclone memory semantic rebuild`	CLI (build sidecar)
`codeclone memory semantic probe [--exact-tokens] [--json]`	CLI — per-lane projection length stats
VS Code `codeclone.memory.searchSemantic` (default `true`)	Passes MCP `semantic` on IDE search; server opt-in unchanged
`get_relevant_memory`	No semantic parameter (scoped ranking only)

Search responses include a top-level semantic object:

Field	When set
`used`	`true` only when index + provider + rebuild succeeded
`backend`	e.g. `lancedb` from index status
`provider`	Config label (`diagnostic`, …)
`model`	Provider `model_id` when used
`index_version`	`SEMANTIC_INDEX_FORMAT_VERSION` when used
`reason`	Degrade reason when `used` is false

codeclone memory semantic probe emits per-lane stats under lanes.{memory,audit,trajectory}. Default estimator is cheap planning; pass --exact-tokens to load the FastEmbed tokenizer and measure passage-prefixed texts that rebuild would embed. With --exact-tokens, trajectory uses the same chunker as rebuild: lanes.trajectory.chunking reports {source_documents, index_units, multi_chunk_sources} and documents counts index units (not raw projections). Lane-level overflow_examples (up to five) list index units still above the model window. Chunking reserves passage prefix and model special tokens; rebuild fails closed with SemanticChunkingInvariantError when a chunk cannot be proven to fit.

Format 2 indexes long trajectory projections as multiple chunk rows linked by parent_id (single-chunk trajectories keep the trajectory id as id). Hybrid search oversamples trajectory k × TRAJECTORY_SEARCH_OVERSAMPLE (4), collapses chunk hits to the best score per trajectory, and sets matched_chunk_index / matched_chunk_count on the returned trajectory hit.

When semantic hits audit rows, payload.audit_events lists hydrated incidents (event type, bounded summary preview, score) alongside memory records.

Refs:

codeclone/memory/retrieval/service.py:_handle_semantic_search_mode
codeclone/memory/semantic/__init__.py:resolve_semantic_index
tests/test_semantic_projection.py, tests/test_semantic_rebuild.py, tests/test_semantic_chunking.py, tests/test_semantic_projection_probe.py, tests/test_mcp_memory_semantic.py, tests/test_cli_memory_semantic.py