Tools¶
Scholar MCP provides 28 tools organised by scholarly source type: Papers, Patents, Books, and Standards are peer source domains; the remaining sections (Cross-source Utility, PDF Conversion, Task Polling) are cross-cutting. All tools return JSON.
Coverage by domain
Per-domain depth is uneven — papers currently have the richest tool surface (citation graph, recommendations, cross-referencing to all three other domains); standards are the leanest. That reflects public data availability, not a value hierarchy. Parity work is tracked in GitHub issues and milestones.
All tools include MCP tool annotations:
- Read-only tools:
readOnlyHint=true,destructiveHint=false,openWorldHint=true - Write tools (PDF):
readOnlyHint=false,destructiveHint=false,openWorldHint=true - Task polling tools:
readOnlyHint=true,destructiveHint=false,openWorldHint=false
Async Task Queue¶
Long-running operations return immediately with a task ID instead of blocking:
- PDF tools always queue (unless the result is already cached locally)
- S2 tools queue when the Semantic Scholar API responds with HTTP 429 (rate limited)
When a tool queues an operation, it returns:
Poll with get_task_result to check status and retrieve the result. Task results expire after 10 minutes (S2 tools) or 1 hour (PDF tools).
Papers — Search & Retrieval¶
search_papers¶
Full-text search across the Semantic Scholar corpus.
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | (required) | Search query string |
fields |
string | "compact" |
Field set: compact, standard, or full |
limit |
int | 10 |
Results per page (max 100) |
offset |
int | 0 |
Pagination offset |
year_start |
int | -- | Filter: earliest publication year |
year_end |
int | -- | Filter: latest publication year |
fields_of_study |
list[string] | -- | S2 field-of-study names (e.g. ["Computer Science", "Physics"]) |
venue |
string | -- | Filter by venue name |
min_citations |
int | -- | Minimum citation count |
sort |
string | "relevance" |
Sort order: relevance, citations, or year |
Returns: {"data": [...], "total": N} where each item contains the requested field set.
Field sets:
- compact --
paperId,title,year,venue,citationCount - standard -- compact +
authors,externalIds,abstract - full -- standard +
tldr,openAccessPdf,fieldsOfStudy,referenceCount
get_paper¶
Fetch full metadata for a single paper.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | DOI, S2 paper ID, ARXIV:id, ACM:id, or PMID:id |
Returns: Full paper metadata (always uses the full field set) or {"error": "not_found"}.
Results are cached for 30 days. Identifier aliases (e.g. DOI to S2 ID) are cached permanently.
get_author¶
Fetch an author profile or search by name.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | Numeric S2 author ID for direct lookup, or a name string for search |
limit |
int | 20 |
Max publications to return (direct lookup) |
offset |
int | 0 |
Pagination offset for publications |
Returns:
- Direct lookup (numeric ID): author profile with paginated publications list
- Name search (text):
{"candidates": [...]}with up to 5 matching authors
Papers — Citation Graph¶
get_citations¶
Forward citations -- papers that cite the given paper.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | Paper ID (DOI, S2 ID, etc.) |
fields |
string | "compact" |
Field set for citing papers |
limit |
int | 20 |
Max results (max 1000) |
offset |
int | 0 |
Pagination offset |
year_start |
int | -- | Filter: earliest year |
year_end |
int | -- | Filter: latest year |
fields_of_study |
list[string] | -- | Field-of-study filter |
min_citations |
int | -- | Minimum citation count filter |
Returns: {"data": [{"citingPaper": {...}}, ...]}.
get_references¶
Backward references -- papers cited by the given paper.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | Paper ID (DOI, S2 ID, etc.) |
fields |
string | "compact" |
Field set for cited papers |
limit |
int | 50 |
Max results (max 1000) |
offset |
int | 0 |
Pagination offset |
Returns: {"data": [{"citedPaper": {...}}, ...]}.
get_citation_graph¶
BFS traversal from one or more seed papers, collecting nodes and edges.
| Parameter | Type | Default | Description |
|---|---|---|---|
seed_ids |
list[string] | (required) | 1--10 seed paper IDs |
direction |
string | "citations" |
citations (forward), references (backward), or both |
depth |
int | 1 |
BFS depth (1--3, clamped) |
max_nodes |
int | 100 |
Hard cap on collected nodes |
year_start |
int | -- | Filter: earliest year |
year_end |
int | -- | Filter: latest year |
fields_of_study |
list[string] | -- | Field-of-study filter |
min_citations |
int | -- | Minimum citation count filter |
Returns:
{
"nodes": [{"id": "...", "title": "...", "year": 2024, "citationCount": 42}, ...],
"edges": [{"source": "id1", "target": "id2"}, ...],
"stats": {
"total_nodes": 42,
"total_edges": 67,
"depth_reached": 2,
"truncated": false
}
}
Controlling graph size
Start with depth=1 and a small max_nodes to get an overview, then increase as needed. depth=3 with direction=both can produce very large graphs.
See Citation Graphs guide for usage patterns.
find_bridge_papers¶
Find the shortest citation path between two papers.
| Parameter | Type | Default | Description |
|---|---|---|---|
source_id |
string | (required) | Starting paper ID |
target_id |
string | (required) | Target paper ID |
max_depth |
int | 4 |
Maximum BFS depth |
direction |
string | "both" |
citations, references, or both |
Returns:
{
"found": true,
"path": [
{"paperId": "source", "title": "...", ...},
{"paperId": "bridge", "title": "...", ...},
{"paperId": "target", "title": "...", ...}
]
}
Or {"found": false} if no path exists within max_depth.
Papers — Recommendations¶
recommend_papers¶
Paper recommendations based on positive (and optional negative) examples.
| Parameter | Type | Default | Description |
|---|---|---|---|
positive_ids |
list[string] | (required) | 1--5 S2 paper IDs as positive examples |
negative_ids |
list[string] | -- | S2 paper IDs to steer recommendations away from |
limit |
int | 10 |
Number of recommendations |
fields |
string | "standard" |
Field set for returned papers |
Returns: JSON list of recommended papers.
Tip
Recommendations work best with 3--5 positive examples that represent the topic you're interested in. Adding 1--2 negative examples that are close but off-topic helps narrow results.
Papers — Enrichment¶
enrich_paper¶
Augment Semantic Scholar metadata with OpenAlex data.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | S2 paper ID or DOI:xxx |
fields |
list[string] | (required) | Fields to retrieve: affiliations, funders, oa_status, concepts |
Available fields:
| Field | Description |
|---|---|
affiliations |
Institution display names from author affiliations |
funders |
Funding organization names |
oa_status |
Open access status string (e.g. gold, green, hybrid); also includes is_oa boolean |
concepts |
List of {"name": "...", "score": 0.95} topic concepts |
Results are cached for 30 days.
Papers — Citation Generation¶
generate_citations¶
Generate formatted citations for one or more papers. Resolves papers via Semantic Scholar, optionally enriches with OpenAlex metadata, and formats as BibTeX, CSL-JSON, or RIS.
| Parameter | Type | Default | Description |
|---|---|---|---|
paper_ids |
list[string] | (required) | Paper identifiers (S2 IDs, DOIs, arXiv IDs, etc.). Max 100. |
citation_format |
string | "bibtex" |
Output format: bibtex, csl-json, or ris |
enrich |
boolean | true |
Attempt OpenAlex enrichment for missing venue data |
BibTeX output includes entry type inference (@article, @inproceedings, @misc, @book), proper author formatting ({Last}, First), title casing preservation, DOI, arXiv eprint fields, and special character escaping. Papers with book_metadata (ISBN or publisher) are emitted as @book entries with publisher, edition, and isbn fields.
CSL-JSON output returns {"citations": [...], "errors": [...]} -- the citations array contains standard CSL-JSON objects compatible with Zotero, Mendeley, Pandoc, and other CSL processors. Book entries use type: "book" with publisher and ISBN fields.
RIS output uses standard RIS tags (TY, AU, TI, PY, JO/BT, DO, UR, AB, ER). Book entries use TY - BOOK with PB (publisher) and SN (ISBN) tags.
Papers that fail to resolve are reported inline (BibTeX/RIS: as comments, CSL-JSON: in the errors array) rather than failing the entire request. When all papers fail, a structured error is returned: {"error": "no_papers_resolved", "failed": [...]}.
Enrichment
When enrich is enabled and a paper has no venue but has a DOI, the tool queries OpenAlex to fill in the venue name. This improves citation quality for papers where Semantic Scholar has incomplete metadata.
Books¶
Book tools use Open Library and Google Books as data sources. No API key is required (a Google Books API key is optional for higher rate limits). Rate limits are handled automatically; if an API is temporarily unavailable, calls queue and return a task ID (see Async Task Queue).
search_books¶
Search for books by title, author, or free text via Open Library. For best results, use the title and author parameters rather than query — they use dedicated search indexes and return far better results.
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | null |
Free-text fallback. Use title/author when known. |
title |
string | null |
Book title or partial title (recommended). |
author |
string | null |
Author name (recommended). |
limit |
int | 10 |
Maximum results to return (max 50) |
At least one of query, title, or author must be provided.
When only query is given, it is first tried as a title search (better relevance) and falls back to free-text if no results are found. When author is given with multiple tokens (e.g. "Frank Duffy") and initial results are thin, a broadened search is automatically attempted to catch name variants (e.g. Frank → Francis).
Returns: JSON list of book records. Each record contains:
[
{
"title": "Deep Learning",
"authors": ["Ian Goodfellow", "Yoshua Bengio", "Aaron Courville"],
"publisher": "MIT Press",
"year": 2016,
"edition": null,
"isbn_10": "0262035618",
"isbn_13": "9780262035613",
"openlibrary_work_id": "OL17953442W",
"openlibrary_edition_id": "OL26423929M",
"cover_url": "https://covers.openlibrary.org/b/isbn/9780262035613-M.jpg",
"google_books_url": "https://books.google.com/books?id=Np9SDQAAQBAJ",
"worldcat_url": "https://www.worldcat.org/isbn/9780262035613",
"snippet": "An introduction to a broad range of topics in deep learning...",
"cover_path": null,
"subjects": ["Machine learning", "Artificial intelligence"],
"page_count": 800,
"description": null
}
]
get_book¶
Fetch full metadata for a single book by ISBN or Open Library identifier. Optionally download and cache the cover image locally.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | ISBN-10, ISBN-13, Open Library work ID, or edition ID |
include_editions |
bool | false |
If true, fetch work and list editions |
download_cover |
bool | false |
If true, download and cache the cover image locally; adds cover_path to result |
cover_size |
string | "M" |
Cover image size: S (small), M (medium), or L (large) |
Identifier formats:
| Format | Example |
|---|---|
| ISBN-13 | 9780262035613 |
| ISBN-10 | 0262035618 |
| ISBN with hyphens | 978-0-262-03561-3 |
| Open Library work ID | OL17953442W |
| Open Library edition ID | OL26423929M |
Returns: A single book record (same shape as items returned by search_books), or {"error": "not_found", "identifier": "..."} if not found.
Results are cached. Work and edition lookups are cached by their respective Open Library IDs; ISBN lookups are also stored under the resolved ISBN-13. When download_cover=true, the cover image is saved to <cache_dir>/covers/<isbn>_<size>.jpg and the path is returned as cover_path.
get_book_excerpt¶
Fetch a book excerpt and description from Google Books by ISBN. Shows preview availability and a link to the Google Books preview page.
| Parameter | Type | Default | Description |
|---|---|---|---|
isbn |
string | (required) | ISBN-10 or ISBN-13 |
Returns:
{
"title": "Deep Learning",
"authors": ["Ian Goodfellow", "Yoshua Bengio", "Aaron Courville"],
"description": "An introduction to a broad range of topics in deep learning...",
"snippet": "Deep learning is a form of machine learning that enables...",
"preview_link": "https://books.google.com/books?id=Np9SDQAAQBAJ&printsec=frontcover",
"preview_available": true
}
Or {"error": "not_found", "isbn": "..."} if Google Books has no matching volume.
Write-tagged
This tool is write-tagged and hidden when SCHOLAR_MCP_READ_ONLY=true.
recommend_books¶
Recommend books for a subject via the Open Library subject API. Results are sorted by edition count (a proxy for popularity).
| Parameter | Type | Default | Description |
|---|---|---|---|
subject |
string | (required) | Subject or topic (e.g. "machine learning", "algorithms", "computer vision") |
limit |
int | 10 |
Maximum results to return (max 50) |
Returns: A JSON list of book records. Each record has the same shape as search_books results, with fields populated from the Open Library subject API (title, authors, Open Library work ID, cover URL). ISBN and edition fields are null since subject results are work-level.
Results are cached for 7 days, keyed by the normalized subject slug. Up to 50 results are fetched and cached; the limit parameter slices the cached pool on return.
Auto-Enrichment¶
Scholar MCP uses a phased enrichment pipeline that automatically augments paper and book results with metadata from multiple sources. Enrichment runs in two phases:
- Phase 0 (primary): OpenAlex (OA status, affiliations, funders, concepts) and CrossRef (publisher, page ranges, container titles)
- Phase 1 (secondary): Open Library (book metadata for papers with ISBNs) and Google Books (preview links, excerpts, snippets)
When get_paper, get_citations, get_references, or get_citation_graph retrieves a paper that has an ISBN in its externalIds field, Open Library metadata is automatically fetched and attached as a book_metadata key on the paper record.
Trigger condition: externalIds.ISBN is present and non-empty.
Added field: book_metadata — a dict containing:
| Field | Description |
|---|---|
publisher |
Publisher name |
edition |
Edition string (e.g. "2nd ed.") |
isbn_13 |
ISBN-13 |
cover_url |
Cover image URL (from Open Library covers) |
openlibrary_work_id |
Open Library work ID (e.g. OL17953442W) |
description |
Work description, if available |
subjects |
List of subject strings |
page_count |
Page count, if known |
authors |
List of author name strings (resolved from Open Library work metadata) |
CrossRef enrichment adds publisher metadata, page ranges, and container titles to paper results when a DOI is available. Google Books enrichment adds google_books_url and snippet to book records when an ISBN is present.
Enrichment failures are silently skipped — if a source is unreachable or returns no data, the record is returned without that enrichment layer. Up to 5 concurrent requests are made per batch.
Cross-source Utility¶
batch_resolve¶
Resolve up to 100 paper, patent, or book identifiers to full metadata in a single call.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifiers |
list[string] | (required) | Up to 100 IDs: S2 paper IDs, DOI:xxx, plain DOIs, patent numbers (e.g. EP1234567A1), or ISBNs (prefixed ISBN:, e.g. ISBN:9780262035613) |
fields |
string | "standard" |
Field set (applies to paper results only) |
Returns: JSON list of resolved items:
- Paper results have a
"paper"key. Papers not found in Semantic Scholar are automatically tried via OpenAlex (by DOI); results from OpenAlex include"source": "openalex". When the citation string contains chapter patterns (e.g. "Chapter 3", "pp. 45-67"), achapter_infodict is attached with parsed chapter/page information. - Patent results have a
"patent"key and"source_type": "patent". Patent numbers are auto-detected by their two-letter country prefix (e.g.EP,US,WO) and routed to the EPO OPS API. - Book results have a
"book"key and"source_type": "book". ISBNs (prefixed withISBN:) are routed to Open Library. - Unresolved items have an
"error"key.
Patents¶
Credentials required
Patent tools require EPO OPS credentials. When SCHOLAR_MCP_EPO_CONSUMER_KEY and SCHOLAR_MCP_EPO_CONSUMER_SECRET are not set, these tools are automatically hidden. See EPO OPS configuration for setup instructions.
search_patents¶
Search for patents across 100+ patent offices via the EPO Open Patent Services (OPS) API.
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | -- | Keyword search — searches patent titles and abstracts. Optional when using structured filters. |
cpc_classification |
string | -- | CPC classification code filter (e.g. "H01M10/00") |
applicant |
string | -- | Applicant (assignee) name filter |
inventor |
string | -- | Inventor name filter |
date_from |
string | -- | Earliest date (YYYY-MM-DD) |
date_to |
string | -- | Latest date (YYYY-MM-DD) |
date_type |
string | "publication" |
Date field: publication, filing, or priority |
jurisdiction |
string | -- | Country code filter (e.g. EP, US, WO) |
limit |
int | 10 |
Results per page (max 100) |
offset |
int | 0 |
Pagination offset |
At least one parameter must be provided.
Returns: {"total_count": N, "references": [...]} where each reference has country, number, and kind fields.
Query syntax
The tool translates parameters into EPO CQL internally. query searches titles and abstracts only — it will not find patents where the keyword appears only in inventor/applicant names. To find all patents by an inventor or from an applicant, omit query and use only the structured filters (e.g. inventor="Smith"). Combining query with filters applies both constraints.
get_patent¶
Fetch detailed information for a single patent by its publication number.
| Parameter | Type | Default | Description |
|---|---|---|---|
patent_number |
string | (required) | Patent number in any format (e.g. EP1234567A1, WO2024/123456, US11,234,567B2) |
sections |
list[string] | ["biblio"] |
Sections to retrieve |
Available sections:
| Section | Description | Status |
|---|---|---|
biblio |
Bibliographic metadata (title, applicants, inventors, dates, classifications, abstract) | Available |
claims |
Patent claims text (English preferred) | Available |
description |
Full patent description text (English preferred) | Available |
family |
Patent family members across jurisdictions (country, number, kind, date) | Available |
legal |
Legal status events (date, code, description) | Available |
citations |
Patent and non-patent literature citations, with Semantic Scholar resolution for NPL | Available |
Sections are fetched concurrently where possible (cache lookups run in parallel; EPO API calls are serialised by the client). Each section is cached independently with appropriate TTLs.
Returns: A JSON object with keys matching the requested sections:
{
"patent_number": "EP.1234567.A1",
"biblio": {
"title": "...",
"abstract": "...",
"applicants": ["..."],
"inventors": ["..."],
"publication_number": "EP.1234567.A1",
"publication_date": "2020-01-15",
"filing_date": "2019-06-01",
"priority_date": "2019-01-15",
"family_id": "12345678",
"classifications": ["H04L29/06"],
"url": "https://worldwide.espacenet.com/..."
},
"claims": "1. A method for...\n\n2. The method of claim 1...",
"family": [
{"country": "US", "number": "11234567", "kind": "B2", "date": "2021-03-01"}
],
"legal": [
{"date": "2019-05-01", "code": "APPLICATION", "description": "Application filed"}
],
"citations": {
"patent_refs": [
{"country": "US", "number": "9876543", "kind": "B2"}
],
"npl_refs": [
{"raw": "Smith et al., \"Widget Processing\", 2018, doi:10.1234/test", "paper": {"paperId": "abc123", "title": "Widget Processing"}, "confidence": "high"},
{"raw": "Doe, \"Advanced Widgets\", Ch. 5, pp. 112-130, 2019", "confidence": null, "chapter_info": {"citation_source": "parsed", "chapter_number": 5, "page_start": 112, "page_end": 130}}
]
}
}
When citations is requested, non-patent literature (NPL) references are resolved against Semantic Scholar on a best-effort basis. References with a DOI are resolved with "confidence": "high". References without a DOI or that fail to resolve have "confidence": null. When citation strings contain chapter patterns (e.g. "Ch. 5", "pp. 112-130"), a chapter_info dict is included with parsed chapter and page information.
get_citing_patents¶
Find patents that cite a given academic paper. Coverage is incomplete -- relies on EPO OPS citation search, which does not capture all patent-to-paper citations. Best results with DOIs of well-known papers.
| Parameter | Type | Default | Description |
|---|---|---|---|
paper_id |
string | (required) | Paper identifier (DOI preferred) |
limit |
int | 10 |
Maximum citing patents to return (max 25) |
Returns:
{
"paper_id": "10.1234/test",
"patents": [
{"title": "...", "publication_number": "EP.9999999.A1", "match_source": "epo_search", ...}
],
"total_count": 1,
"note": "Coverage is incomplete. Results come from EPO OPS citation search..."
}
Incomplete coverage
Not all patent-to-paper citations are captured by EPO OPS. Use this tool for discovery, not exhaustive analysis.
fetch_patent_pdf¶
Download a patent PDF via authenticated EPO OPS and optionally convert to Markdown.
| Parameter | Type | Default | Description |
|---|---|---|---|
patent_number |
string | (required) | Patent number in any format (e.g. EP3491801B1, US10123456B2, WO2024/123456) |
use_vlm |
bool | false |
Enable VLM enrichment for formulas and figures |
Returns: {"pdf_path": "/data/scholar-mcp/pdfs/<stem>.pdf", "markdown": "...", "md_path": "/data/scholar-mcp/md/<stem>.md"} when docling is configured, or just {"pdf_path": "..."} without it.
Not all patents have full text available via OPS — WO and older EP patents sometimes lack PDFs. Returns {"error": "pdf_not_available"} in that case.
Write-tagged
This tool is write-tagged and hidden when SCHOLAR_MCP_READ_ONLY=true. It also requires EPO OPS credentials.
Standards¶
Scholar MCP supports Tier 1 standards bodies (NIST, IETF, W3C, ETSI) with full metadata and optional full-text conversion. Tier 2 paywalled bodies (ISO, IEC, IEEE) are tracked in GitHub issues.
resolve_standard_identifier¶
Normalise a messy citation string to its canonical form and body.
| Parameter | Type | Default | Description |
|---|---|---|---|
raw |
string | — | Messy citation string (e.g. "rfc9000", "nist 800-53") |
search_standards¶
Search standards by identifier, title, or free text.
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | — | Identifier, title, or free text |
body |
string | null | Filter to one body: NIST, IETF, W3C, ETSI |
limit |
integer | 10 | Max results (max 50) |
get_standard¶
Retrieve a standard by identifier (canonical or fuzzy). Optionally fetches and converts full text via docling.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | — | Canonical or fuzzy identifier |
fetch_full_text |
boolean | false | Fetch and convert full text via docling |
get_sync_status¶
Reports the last run of each Tier 2 standards sync. Returns one record per
body — see scholar-mcp sync-standards in the CLI docs.
Returns: {"runs": [{"body", "upstream_ref", "added", "updated", "unchanged", "withdrawn", "errors", "started_at", "finished_at"}]}.
An empty runs list means no sync has been run yet.
PDF Conversion¶
Write-tagged tools
All PDF tools are tagged as write operations. They are hidden by default when SCHOLAR_MCP_READ_ONLY=true. Set SCHOLAR_MCP_READ_ONLY=false to enable them.
fetch_paper_pdf¶
Download the PDF for a paper. Tries the Semantic Scholar open-access URL first, then falls back to alternative sources: ArXiv (from externalIds), PubMed Central, and Unpaywall (by DOI, requires SCHOLAR_MCP_CONTACT_EMAIL).
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | Paper ID (DOI, S2 ID, etc.) |
Returns: {"path": "/data/scholar-mcp/pdfs/<id>.pdf", "source": "s2_oa"} or an error:
{"error": "no_oa_pdf"}-- no PDF URL found from any source{"error": "download_failed"}-- HTTP error downloading the PDF
The source field indicates where the PDF was obtained: s2_oa, arxiv, pmc, or unpaywall.
convert_pdf_to_markdown¶
Convert a local PDF file to Markdown via docling-serve.
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path |
string | (required) | Absolute path to a PDF file |
use_vlm |
bool | false |
Enable VLM enrichment for formulas and figures |
Returns: {"markdown": "...", "path": "/data/scholar-mcp/md/<name>.md", "vlm_used": true/false}.
When use_vlm is requested but VLM is not configured, the response includes a vlm_skip_reason field (e.g. "vlm_api_url_not_configured").
Start without VLM
Standard conversion handles most papers well. Only retry with use_vlm=true when the result has garbled formulas or missing figure descriptions. VLM enrichment processes each formula and figure image individually via an external vision model, which is significantly slower.
Caching: Standard and VLM conversions are cached separately (<stem>.md vs <stem>_vlm.md), so switching modes never overwrites a previous conversion.
Requires SCHOLAR_MCP_DOCLING_URL to be set. VLM enrichment additionally requires SCHOLAR_MCP_VLM_API_URL and SCHOLAR_MCP_VLM_API_KEY.
fetch_and_convert¶
Full pipeline: resolve paper, download PDF, convert to Markdown. Uses the same alternative source fallback as fetch_paper_pdf.
| Parameter | Type | Default | Description |
|---|---|---|---|
identifier |
string | (required) | Paper ID (DOI, S2 ID, etc.) |
use_vlm |
bool | false |
Enable VLM enrichment |
Returns: On full success:
{
"metadata": {"paperId": "...", "title": "...", ...},
"markdown": "# Paper Title\n...",
"pdf_path": "/data/scholar-mcp/pdfs/<id>.pdf",
"md_path": "/data/scholar-mcp/md/<id>.md",
"pdf_source": "s2_oa",
"vlm_used": false
}
Partial results are returned if a later stage fails (e.g. metadata + error if no OA PDF is available). The pdf_source field indicates the download source. When VLM is requested but not configured, the response includes vlm_skip_reason.
Start without VLM
Same advice as convert_pdf_to_markdown — try standard first, add VLM only if formulas or figures are missing. VLM and standard conversions are cached separately (<id>.md vs <id>_vlm.md).
See PDF Conversion guide for setup instructions.
fetch_pdf_by_url¶
Download a PDF from any URL and optionally convert to Markdown. Use this when you have found an alternative PDF link (e.g. from an author's homepage, a preprint server, or an institutional repository).
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | (required) | Direct URL to a PDF file |
filename |
string | -- | Filename stem for caching (e.g. "smith2024_attention"). Derived from the URL if omitted. |
use_vlm |
bool | false |
Enable VLM enrichment for formulas and figures |
Returns: On success (with docling configured):
{
"pdf_path": "/data/scholar-mcp/pdfs/<filename>.pdf",
"markdown": "# Paper Title\n...",
"md_path": "/data/scholar-mcp/md/<filename>.md",
"vlm_used": false
}
Without docling, only pdf_path is returned. The PDF is cached by filename, so subsequent calls with the same filename return immediately.
Task Polling¶
get_task_result¶
Poll for the result of a background task.
| Parameter | Type | Default | Description |
|---|---|---|---|
task_id |
string | (required) | Task ID returned by a queued operation |
Returns:
Status values: pending, running, completed, failed. The result field contains the original tool output as a JSON string (only present when completed). On failed, an error field describes the failure.
While the task is in progress (pending or running), the response includes extra fields:
{
"task_id": "a1b2c3d4e5f6",
"status": "running",
"elapsed_seconds": 45,
"tool": "convert_pdf_to_markdown",
"hint": "PDF conversion typically takes 1-5 minutes depending on page count."
}
The hint field gives expected duration — keep polling until the task completes.
list_tasks¶
List all active (non-expired) background tasks.
Returns: JSON list of {"task_id": "...", "status": "..."} dicts.