Collection¶
The Collection class is the primary public API for the library. MCP tools, CLI commands, and direct integrations all go through this class.
Quick Start¶
from pathlib import Path
from markdown_vault_mcp import Collection
# Basic read-only collection
collection = Collection(source_dir=Path("/path/to/vault"))
stats = collection.build_index()
print(f"Indexed {stats.documents_indexed} documents")
# Search
results = collection.search("query text", limit=10)
for r in results:
print(f"{r.path}: {r.title} (score: {r.score:.2f})")
# Read a document
note = collection.read("Journal/note.md")
print(note.content)
API Reference¶
Collection(*, source_dir, index_path=None, embeddings_path=None, embedding_provider=None, read_only=True, state_path=None, indexed_frontmatter_fields=None, required_frontmatter=None, chunk_strategy='heading', on_write=None, git_strategy=None, git_pull_interval_s=0, exclude_patterns=None, attachment_extensions=None, max_attachment_size_mb=10.0)
¶
Facade over FTS5 index, vector index, and change tracker.
Instantiate once per collection root. Call :meth:build_index (or let
lazy initialisation handle it) before querying.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_dir
|
Path
|
Root directory of the markdown collection. |
required |
index_path
|
Path | None
|
Path to the SQLite index file. |
None
|
embeddings_path
|
Path | None
|
Base path for the |
None
|
embedding_provider
|
EmbeddingProvider | None
|
Provider used to generate embeddings. Required when embeddings_path is set. |
None
|
read_only
|
bool
|
When |
True
|
state_path
|
Path | None
|
Path to the hash-state JSON file used by
:class: |
None
|
indexed_frontmatter_fields
|
list[str] | None
|
Frontmatter keys whose values are
promoted to the |
None
|
required_frontmatter
|
list[str] | None
|
If provided, documents missing any listed field are excluded from the index entirely. |
None
|
chunk_strategy
|
str | ChunkStrategy
|
|
'heading'
|
on_write
|
WriteCallback | None
|
Optional callback invoked after every successful write
operation. Signature:
|
None
|
git_strategy
|
GitWriteStrategy | None
|
Optional git strategy used for background git tasks (e.g.
periodic fetch + ff-only updates). Started via :meth: |
None
|
git_pull_interval_s
|
int
|
Interval in seconds for periodic pulls. |
0
|
pause_writes()
¶
Block all write operations until the context exits.
Write operations are queued (blocked on the lock) rather than being rejected. Reads and search remain unblocked at the Python level.
sync_from_remote_before_index()
¶
One-time git fetch + ff-only update before build_index().
Intended to run during server startup before the initial index build. No reindex is triggered here because build_index() will scan the updated working tree.
start()
¶
Start background tasks for this Collection (e.g. git pull loop).
stop()
¶
Stop background tasks (e.g. git pull loop) without closing the collection.
Safe to call multiple times. A no-op if no pull loop was started. The SQLite connection and write callback remain open; only the pull loop thread is signalled to stop.
close()
¶
Release resources held by the collection.
Flushes deferred embeddings and pending write callbacks, then closes the SQLite connection and git strategy.
search(query, *, limit=10, mode='keyword', filters=None, folder=None)
¶
Search the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Search string. |
required |
limit
|
int
|
Maximum number of results to return. |
10
|
mode
|
Literal['keyword', 'semantic', 'hybrid']
|
|
'keyword'
|
filters
|
dict[str, str] | None
|
Dict of |
None
|
folder
|
str | None
|
If provided, restrict results to documents in this folder (and its sub-folders). |
None
|
Returns:
| Type | Description |
|---|---|
list[SearchResult]
|
List of :class: |
list[SearchResult]
|
relevance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If mode is |
read(path)
¶
Read the full content of a document from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative document path (e.g. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
A |
NoteContent | None
|
class: |
NoteContent | None
|
if the file does not exist. |
list(*, folder=None, pattern=None, include_attachments=False)
¶
List documents (and optionally attachments) in the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder
|
str | None
|
If provided, only return documents in this folder (and sub-folders). |
None
|
pattern
|
str | None
|
Unix glob matched against the relative path using
:func: |
None
|
include_attachments
|
bool
|
When |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
list[NoteInfo | AttachmentInfo]
|
List of :class: |
|
optionally |
list[NoteInfo | AttachmentInfo]
|
class: |
list[NoteInfo | AttachmentInfo]
|
objects. |
build_index(*, force=False)
¶
Scan source_dir and build the FTS index.
If the index already contains documents and force is False,
this is a no-op. force=True drops all existing data and rebuilds
from scratch.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
IndexStats
|
class: |
reindex()
¶
Incrementally update the index based on file changes.
Returns:
| Type | Description |
|---|---|
ReindexResult
|
class: |
ReindexResult
|
applied. |
build_embeddings(*, force=False)
¶
Build the vector index from all chunks currently in the FTS index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
int
|
Total number of chunks embedded. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
embeddings_status()
¶
Return status information about the vector index.
Returns:
| Type | Description |
|---|---|
dict
|
Dict with keys |
dict
|
|
list_folders()
¶
Return all distinct folder values across the indexed collection.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of folder strings ( |
list_tags(field='tags')
¶
Return all distinct values indexed for a given frontmatter field.
If field was not in indexed_frontmatter_fields, returns [].
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
field
|
str
|
Frontmatter key to query (default: |
'tags'
|
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of distinct value strings. |
get_toc(path)
¶
Return table of contents for a document.
Queries the FTS sections table for headings and prepends the document title as a synthetic H1 entry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative path to the document (e.g. |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of |
list[dict[str, Any]]
|
position, with the document title prepended as level 1. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no document exists at the given path. |
get_backlinks(path)
¶
Return all documents that link to the given document.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative path of the target document
(e.g. |
required |
Returns:
| Type | Description |
|---|---|
list[BacklinkInfo]
|
List of :class: |
list[BacklinkInfo]
|
for each document that contains a link pointing to |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no document exists at the given path. |
get_outlinks(path)
¶
Return all links from the given document to other documents.
The exists field on each :class:~markdown_vault_mcp.types.OutlinkInfo
indicates whether the target document is currently indexed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative path of the source document
(e.g. |
required |
Returns:
| Type | Description |
|---|---|
list[OutlinkInfo]
|
List of :class: |
list[OutlinkInfo]
|
each link originating from |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no document exists at the given path. |
get_broken_links(*, folder=None)
¶
Return all links whose target does not exist in the collection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder
|
str | None
|
If provided, restrict to source documents in this folder (exact match or sub-folder prefix). |
None
|
Returns:
| Type | Description |
|---|---|
list[BrokenLinkInfo]
|
List of :class: |
get_similar(path, *, limit=10)
¶
Return the most semantically similar chunks from other documents.
Uses the stored embedding vectors for path (averaged across
chunks) to compute cosine similarity against all other documents.
No re-embedding is needed. Results are at chunk granularity —
the same document may appear multiple times if it has many chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative path of the reference document. |
required |
limit
|
int
|
Maximum number of results to return. |
10
|
Returns:
| Type | Description |
|---|---|
list[SearchResult]
|
List of :class: |
list[SearchResult]
|
ordered by descending similarity. Returns |
list[SearchResult]
|
are not configured or the document has no stored vectors. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no document exists at the given path. |
get_recent(*, limit=20, folder=None)
¶
Return the most recently modified documents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of documents to return. |
20
|
folder
|
str | None
|
If provided, restrict to documents in this folder (exact match or sub-folder prefix). |
None
|
Returns:
| Type | Description |
|---|---|
list[NoteInfo]
|
List of :class: |
list[NoteInfo]
|
ordered by modification time (most recent first). |
get_context(path, *, similar_limit=5, link_limit=10)
¶
Return a consolidated context dossier for a document.
Combines backlinks, outlinks, similar notes, folder peers, and indexed frontmatter tags into a single response, saving the caller multiple round trips.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative path of the document (e.g. |
required |
similar_limit
|
int
|
Maximum number of similar notes to include. |
5
|
link_limit
|
int
|
Maximum number of backlinks and outlinks to include. |
10
|
Returns:
| Name | Type | Description |
|---|---|---|
A |
NoteContext
|
class: |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no document exists at the given path. |
get_orphan_notes()
¶
get_most_linked(*, limit=10)
¶
Return the documents with the most inbound links.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
limit
|
int
|
Maximum number of results to return. Default 10. |
10
|
Returns:
| Type | Description |
|---|---|
list[MostLinkedNote]
|
List of :class: |
list[MostLinkedNote]
|
by backlink_count descending. |
get_connection_path(source, target, max_depth=10)
¶
Return the shortest undirected path between two notes.
Treats the link graph as undirected — a link in either direction counts as a connection. Uses BFS with a configurable depth cap.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
Vault-relative path of the starting note. |
required |
target
|
str
|
Vault-relative path of the destination note. |
required |
max_depth
|
int
|
Maximum path length in edges. Clamped to |
10
|
Returns:
| Type | Description |
|---|---|
list[str] | None
|
Ordered list of vault-relative paths from source to target |
list[str] | None
|
(inclusive), or |
Raises:
| Type | Description |
|---|---|
ValueError
|
If source or target is not found in the index. |
get_history(path=None, since=None, until=None, limit=20)
¶
Return commits that touched a note or the whole vault.
When path is None, queries the full vault history. Returns an
empty list for vaults whose source directory is not inside a git
repository.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | None
|
Vault-relative path of the note to filter on (e.g.
|
None
|
since
|
str | None
|
ISO 8601 datetime string or git date expression (e.g.
|
None
|
until
|
str | None
|
ISO 8601 datetime string or git date expression, passed as
|
None
|
limit
|
int
|
Maximum number of commits to return. Clamped to
|
20
|
Returns:
| Type | Description |
|---|---|
list[HistoryEntry]
|
List of :class: |
list[HistoryEntry]
|
newest-first. Empty list when the vault has no git history or |
list[HistoryEntry]
|
the note has no commits in the given range. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If path is provided but fails path validation. |
get_diff(path, since_sha=None, since_timestamp=None, per_commit=False, limit=None)
¶
Return the diff of a note between a reference point and HEAD.
Exactly one of since_sha or since_timestamp must be supplied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Vault-relative path of the note to diff. Must end with
|
required |
since_sha
|
str | None
|
A commit SHA (full or abbreviated, at least 4 hex digits) to diff from. Mutually exclusive with since_timestamp. |
None
|
since_timestamp
|
str | None
|
ISO 8601 datetime string resolved to the most
recent commit at or before that point via |
None
|
per_commit
|
bool
|
When |
False
|
limit
|
int | None
|
When per_commit is |
None
|
Returns:
| Type | Description |
|---|---|
str | list[CommitDiff]
|
A unified diff string when per_commit is |
str | list[CommitDiff]
|
class: |
str | list[CommitDiff]
|
|
str | list[CommitDiff]
|
no changes in the given range. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If exactly one of since_sha / since_timestamp is not supplied, since_sha contains invalid characters, or the resolved ref is not found in history. |
stats()
¶
Return collection-wide statistics.
Returns:
| Type | Description |
|---|---|
CollectionStats
|
class: |
read_attachment(path)
¶
Read the binary content of a non-.md attachment.
Delegates to :meth:DocumentManager.read_attachment.
write_attachment(path, content, if_match=None)
¶
Create or overwrite a non-.md attachment.
Delegates to :meth:DocumentManager.write_attachment.
write(path, content, frontmatter=None, if_match=None)
¶
Create or overwrite a document.
Creates intermediate directories as needed. If frontmatter is provided, it is serialised as a YAML header at the top of the file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative document path (e.g. |
required |
content
|
str
|
Markdown body (excluding frontmatter). |
required |
frontmatter
|
dict | None
|
Optional frontmatter dict serialised as a YAML header. |
None
|
if_match
|
str | None
|
Optional etag from a previous :meth: |
None
|
Returns:
| Type | Description |
|---|---|
WriteResult
|
class: |
Raises:
| Type | Description |
|---|---|
ReadOnlyError
|
If the collection is read-only. |
ConcurrentModificationError
|
If if_match is provided and does not match the current file hash. |
ValueError
|
If path escapes the source directory. |
edit(path, old_text=None, new_text='', if_match=None, line_start=None, line_end=None)
¶
Patch a section of a document.
Replaces the first occurrence of old_text with new_text, or replaces the line range [line_start, line_end] when line numbers are given instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative document path. |
required |
old_text
|
str | None
|
Exact text to replace (must occur exactly once). Mutually exclusive with line_start / line_end. |
None
|
new_text
|
str
|
Replacement text (may be empty to delete old_text). |
''
|
if_match
|
str | None
|
Optional etag for optimistic concurrency; see
:meth: |
None
|
line_start
|
int | None
|
1-based start line for line-range mode. |
None
|
line_end
|
int | None
|
1-based end line (inclusive) for line-range mode. |
None
|
Returns:
| Type | Description |
|---|---|
EditResult
|
class: |
Raises:
| Type | Description |
|---|---|
EditConflictError
|
If old_text is not found or appears more than once. |
ReadOnlyError
|
If the collection is read-only. |
ConcurrentModificationError
|
If if_match is provided and does not match. |
ValueError
|
If path escapes the source directory. |
delete(path, if_match=None)
¶
Delete a document or attachment.
Removes the file from disk and purges its entries from the FTS and vector indices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Relative path of the document or attachment to remove. |
required |
if_match
|
str | None
|
Optional etag for optimistic concurrency; see
:meth: |
None
|
Returns:
| Type | Description |
|---|---|
DeleteResult
|
class: |
Raises:
| Type | Description |
|---|---|
ReadOnlyError
|
If the collection is read-only. |
ConcurrentModificationError
|
If if_match is provided and does not match. |
DocumentNotFoundError
|
If path does not exist. |
rename(old_path, new_path, if_match=None, *, update_links=False)
¶
Rename or move a document or attachment.
Moves the file on disk and updates the FTS / vector indices. When
update_links is True, all wikilinks and markdown links in other
documents that pointed to old_path are rewritten to new_path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
old_path
|
str
|
Current relative path of the document or attachment. |
required |
new_path
|
str
|
Desired relative path after the move. |
required |
if_match
|
str | None
|
Optional etag for optimistic concurrency; see
:meth: |
None
|
update_links
|
bool
|
When |
False
|
Returns:
| Type | Description |
|---|---|
RenameResult
|
class: |
Raises:
| Type | Description |
|---|---|
ReadOnlyError
|
If the collection is read-only. |
ConcurrentModificationError
|
If if_match is provided and does not match. |
DocumentNotFoundError
|
If old_path does not exist. |
ValueError
|
If old_path or new_path escapes the source directory. |