Embedding Providers¶
The providers module defines an abstract base class for embedding providers and three concrete implementations for OpenAI, Ollama, and FastEmbed.
Quick Start¶
from markdown_vault_mcp.providers import get_embedding_provider
# Auto-detect based on environment variables
provider = get_embedding_provider()
# Embed a batch of texts
vectors = provider.embed(["hello world", "example text"])
print(f"Dimension: {provider.dimension}")
Provider Selection¶
The get_embedding_provider() function auto-detects the best available provider:
- OpenAI — if
OPENAI_API_KEYis set - Ollama — if
OLLAMA_HOSTis reachable - FastEmbed — if the package is installed
Override with MARKDOWN_VAULT_MCP_EMBEDDING_PROVIDER=openai|ollama|fastembed.
API Reference¶
EmbeddingProvider
¶
Bases: ABC
Abstract base class for embedding providers.
dimension
abstractmethod
property
¶
Embedding dimension size.
Returns:
| Type | Description |
|---|---|
int
|
Integer dimension of each embedding vector. |
provider_name
abstractmethod
property
¶
Stable provider identifier for index compatibility metadata.
model_name
abstractmethod
property
¶
Stable model identifier for index compatibility metadata.
embed(texts)
abstractmethod
¶
Embed a batch of texts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of strings to embed. |
required |
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
List of embedding vectors, one per input text. |
OllamaProvider(host, model, *, cpu_only=False)
¶
Bases: EmbeddingProvider
Embedding provider backed by the Ollama REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
str
|
Base URL of the Ollama server. |
required |
model
|
str
|
Model name to use for embeddings. |
required |
cpu_only
|
bool
|
When |
False
|
Initialise OllamaProvider with explicit parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
host
|
str
|
Base URL of the Ollama server. |
required |
model
|
str
|
Model name to use for embeddings. |
required |
cpu_only
|
bool
|
When |
False
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
dimension
property
¶
Embedding dimension size.
Embeds a test string on first access to determine the dimension.
Returns:
| Type | Description |
|---|---|
int
|
Integer dimension of each embedding vector. |
embed(texts)
¶
Embed a batch of texts via the Ollama REST API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of strings to embed. |
required |
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
List of embedding vectors, one per input text. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the Ollama API returns an error response. |
OpenAIProvider(api_key)
¶
Bases: EmbeddingProvider
Embedding provider backed by the OpenAI Embeddings API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
OpenAI API key for authentication. |
required |
Uses the text-embedding-3-small model.
Initialise OpenAIProvider with an explicit API key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
OpenAI API key for authentication. |
required |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
RuntimeError
|
If |
dimension
property
¶
Embedding dimension size.
Embeds a test string on first access to determine the dimension.
Returns:
| Type | Description |
|---|---|
int
|
Integer dimension of each embedding vector. |
embed(texts)
¶
Embed a batch of texts via the OpenAI Embeddings API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of strings to embed. |
required |
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
List of embedding vectors in input order. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the OpenAI API returns an error response. |
FastEmbedProvider(model_name='BAAI/bge-small-en-v1.5', cache_dir=None)
¶
Bases: EmbeddingProvider
Embedding provider backed by the local fastembed library.
The fastembed package is imported lazily at instantiation
time so that it does not need to be installed unless this provider is used.
Initialise FastEmbed model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
FastEmbed model identifier. |
'BAAI/bge-small-en-v1.5'
|
cache_dir
|
str | None
|
Optional model cache directory. |
None
|
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
dimension
property
¶
Embedding dimension size from the loaded model.
Returns:
| Type | Description |
|---|---|
int
|
Integer dimension of each embedding vector. |
embed(texts)
¶
Embed a batch of texts using the local fastembed model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of strings to embed. |
required |
Returns:
| Type | Description |
|---|---|
list[list[float]]
|
List of embedding vectors, one per input text. |
get_embedding_provider(config)
¶
Auto-detect and return an embedding provider from config.
Checks config.embedding_provider for an explicit selection. When
that field is None, probes for available providers in this order:
- If
config.openai_api_keyis set → :class:OpenAIProvider. - If Ollama is reachable at
config.ollama_host→ :class:OllamaProvider. - If
fastembedcan be imported → :class:FastEmbedProvider. - Raises :class:
RuntimeErrorwith installation instructions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
CollectionConfig
|
Collection configuration containing embedding settings. |
required |
Returns:
| Type | Description |
|---|---|
EmbeddingProvider
|
An initialised :class: |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If no provider is available and
|
ValueError
|
If |