Semantic Search (MSRL)¶
Hybrid vector + keyword search across your entire vault
Overview¶
Semantic Search (MSRL - Markdown Structured Retrieval Library) enables AI to search your entire Obsidian vault using meaning, not just keywords. It combines:
- 75% Vector Similarity - Understands semantic meaning
- 25% BM25 Keyword Matching - Finds exact terms
This is a killer feature for solo developers who need AI to understand their entire project context, not just structured entities.
How It Works¶
graph LR
A[Your Query] --> B[MSRL Engine]
B --> C[Vector Search]
B --> D[Keyword Search]
C --> E[Hybrid Scoring]
D --> E
E --> F[Ranked Results]
F --> G[Exact Excerpts]
Key Features¶
- Hierarchical Indexing - Respects Markdown heading structure
- Exact Provenance - Returns file path, heading, and character position
- Incremental Updates - Only re-indexes changed files (~750ms)
- Auto-Reindexing - Watches for file changes automatically
- Immutable Snapshots - Atomic updates with rollback capability
Use Cases¶
1. Finding Implementation Details¶
Human: "How does authentication work in our system?"
AI: Uses search_docs → finds relevant sections across specs, decisions, and code
AI: "I found 8 relevant sections. The main authentication flow is..."
2. Discovering Related Decisions¶
Human: "What decisions did we make about database choice?"
AI: Uses search_docs with filter → finds all DEC-* files mentioning databases
AI: "Found 3 decisions: DEC-001 (PostgreSQL), DEC-015 (Redis), DEC-023 (Backup strategy)"
3. Researching Past Work¶
Human: "Find all mentions of error handling in the vault"
AI: Uses search_docs → searches across all markdown files
AI: "Found 12 relevant sections across stories, tasks, and documentation"
MCP Tools¶
search_docs¶
Semantic search across all documents in the vault.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | ✅ | Search query text |
top_k |
integer | ❌ | Number of results (default: 8, max: 50) |
max_excerpt_chars |
integer | ❌ | Max excerpt length (default: 4000, max: 20000) |
filters.doc_uri_prefix |
string | ❌ | Filter to documents starting with path (e.g., "stories/") |
filters.doc_uris |
array | ❌ | Filter to specific document URIs |
filters.heading_path_contains |
string | ❌ | Filter to sections containing heading text |
include_scores |
boolean | ❌ | Include detailed scores in results |
Example:
{
"query": "authentication implementation",
"top_k": 10,
"filters": {
"doc_uri_prefix": "decisions/"
}
}
Returns:
{
"results": [
{
"doc_uri": "decisions/DEC-001_PostgreSQL_Database.md",
"heading_path": "Security > Authentication",
"excerpt": "We chose JWT tokens for authentication...",
"start_char": 1234,
"end_char": 1456,
"vector_score": 0.85,
"bm25_score": 0.72,
"hybrid_score": 0.81
}
],
"count": 1,
"took_ms": 45
}
msrl_status¶
Check the status of the semantic search index.
Parameters: None
Returns:
{
"state": "ready",
"snapshot_id": "snapshot_20260124_143022",
"stats": {
"doc_count": 156,
"node_count": 892,
"leaf_count": 2341,
"shard_count": 3
},
"watcher": {
"active": true,
"last_event": "2026-01-24T14:30:22Z"
}
}
Integration with search_entities¶
The search_entities tool supports semantic search mode:
This combines: - MSRL semantic search across entity files - Entity-specific filtering (type, status, workstream) - Structured entity results
Setup¶
1. Model Download¶
On first run, MSRL downloads the bge-m3 embedding model (~615MB) from HuggingFace:
# Pre-download models (optional)
npx obsidian-accomplishments-mcp download-models
# Or set custom model path
export MSRL_MODEL_PATH=/path/to/models/bge-m3/model.onnx
2. Initial Indexing¶
The first index build takes ~30-60 seconds for a typical vault (100-200 files):
[MSRL] Building initial index...
[MSRL] Indexed 156 documents, 2341 chunks
[MSRL] Index ready in 42.3s
3. Incremental Updates¶
After initial indexing, updates are fast (~750ms per file):
Best Practices¶
✅ DO¶
- Use for discovery - "Find all mentions of X"
- Use for context - "What did we decide about Y?"
- Use with filters - Narrow down to specific folders
- Use semantic mode - When meaning matters more than exact keywords
❌ DON'T¶
- Don't use for listing - Use
list_filesinstead - Don't use for specific entities - Use
get_entityinstead - Don't use for structured queries - Use
search_entitieswithoutsemantic=true
Performance¶
| Operation | Time | Notes |
|---|---|---|
| Initial index (200 files) | ~45s | One-time cost |
| Single file update | ~750ms | Incremental |
| Search query | ~50-200ms | Depends on vault size |
| Model load | ~2s | On server start |
Troubleshooting¶
Index Not Ready¶
Solution: Wait for initial indexing to complete. Check status with msrl_status.
Model Download Failed¶
Solution: Check internet connection or manually download model:
Slow Search¶
Solution:
- Reduce top_k parameter
- Use doc_uri_prefix filter to narrow search
- Check vault size (>1000 files may need optimization)
Next Steps¶
- AI Workflows - Learn how to use semantic search in AI conversations
- MCP Tools Reference - Complete tool documentation
- Best Practices - Tips for effective search queries