Model Catalog¶
Per-model narrative metadata read by LLMs when choosing between providers and models. Closed-list providers (openai, gemini, placeholder) use exact-key lookup against MODEL_STYLES. SD WebUI uses the regex-ordered CHECKPOINT_PATTERNS table.
Source: src/image_generation_mcp/providers/model_styles.py. Regenerate with uv run python scripts/render_model_catalog.py.
OpenAI¶
Models exposed by the openai provider. Each model resolves via exact-key lookup against the registry.
| Model ID | Label | Lifecycle |
|---|---|---|
dall-e-2 |
OpenAI DALL-E 2 (legacy) | legacy |
dall-e-3 |
OpenAI DALL-E 3 (deprecated) | deprecated |
gpt-image-1 |
OpenAI GPT Image 1 (legacy) | legacy |
gpt-image-1-mini |
OpenAI GPT Image 1 Mini | current |
gpt-image-1.5 |
OpenAI GPT Image 1.5 | current |
OpenAI DALL-E 2 (legacy) — legacy¶
Use only for inpainting on legacy flows. Prefer gpt-image-1.5 for any new generation work.
Best for: Older OpenAI model retained mostly for inpainting / mask edits at low cost. Limited style fidelity vs current gpt-image-* family. 1024x1024 only. Useful for cheap edits where new code paths can't be added.
Avoid: Don't use for new generation work. No transparent backgrounds, no aspect ratios beyond 1:1, no in-image text, no negative prompts. Quality is well below current OpenAI models.
Good prompt: Inpaint a missing hand on an existing 1024x1024 image (mask edit only — not for new-from-scratch generation)
Bad prompt: Detailed photoreal product shot for a marketing campaign (use gpt-image-1.5 instead — DALL-E 2 quality is well behind)
OpenAI DALL-E 3 (deprecated) — deprecated¶
OpenAI API removal scheduled 2026-05-12. Migrate to gpt-image-1.5 for new long-lived workflows.
Best for: Strong creative interpretation and excellent compliance with multi-clause prompts. Good for stylised illustrations, cinematic concept art, and vivid-style hero images where you want the model to embellish. The natural style produces flatter, more photoreal output suitable for stock-photo and logo work.
Avoid: Don't use for in-image text — text rendering is unreliable. No edits, no inpainting, no transparent background, no negative prompts, no aspect ratios beyond 1024x1024 / 1024x1792 / 1792x1024. Cannot render named real people. Will silently rewrite short prompts — inspect revised_prompt to see what was actually used.
Good prompt: A wide cinematic painting in the style of Thomas Cole's "Desolation" — overgrown classical ruins on a cliff at dusk, vines reclaiming marble columns, single shaft of warm light. Style: natural.
Bad prompt: A birthday cake that says "HAPPY BIRTHDAY SARAH" in elegant script (DALL-E 3 will likely garble the text; route to gpt-image-1.5 for typography-critical work)
OpenAI GPT Image 1 (legacy) — legacy¶
Newer OpenAI image models (gpt-image-1.5) offer better fidelity. This model remains available for compatibility.
Best for: Earlier flagship; same descriptive-paragraph prompt grammar as gpt-image-1.5. Supports transparent backgrounds and the same three aspect ratios. Still capable for general work; newer siblings give better fidelity and instruction following.
Avoid: Avoid CLIP-style tag dumps. No --no negative-prompt syntax. Real-named-people likenesses are filtered. Prefer gpt-image-1.5 for new long-lived workflows.
Good prompt: Studio portrait of a senior watchmaker examining a movement with a loupe, warm rim light from a window, shallow depth of field, no text in frame.
Bad prompt: watchmaker, masterpiece, 8k, ultradetailed (tag-soup style — use descriptive sentences instead)
OpenAI GPT Image 1 Mini¶
Best for: Cheaper variant of gpt-image-1 with similar capabilities at a lower per-image cost. Same descriptive-paragraph grammar; same three aspect ratios. Good default for high-volume drafts and iteration where small quality differences vs the full model are acceptable.
Avoid: Avoid CLIP-style tag dumps. No --no negative-prompt syntax. Same content filters as the full model. For final-grade output where small quality differences matter, prefer gpt-image-1.5.
Good prompt: Quick draft sketch: a fox curled up on a windowsill at dusk, soft watercolour palette, simple background.
Bad prompt: fox, watercolour, ((masterpiece)), [blurry] (parenthetical weight syntax is SD-specific; gpt-image-* ignores it)
OpenAI GPT Image 1.5¶
Best for: Current OpenAI flagship image model. Strong instruction following for photorealistic shots, illustrations, product mockups, infographics, and marketing assets where layout and typography matter. Excels with descriptive paragraphs ordered scene → subject → details → constraints, and with text in image given in quotes with explicit typography hints. Supports transparent backgrounds and 1024x1024 / 1024x1536 / 1536x1024.
Avoid: Avoid CLIP-style comma-separated tag dumps — they underperform vs descriptive sentences. Don't use --no negative-prompt syntax; describe exclusions positively. Long, multi-element scenes with strict spatial composition can drift. Real-named-people likenesses are filtered. No identity consistency across calls.
Good prompt: Editorial product photo of a beige ceramic coffee mug on a worn oak table, shallow depth of field, soft window light from the left, warm muted palette. No text, no logos.
Bad prompt: coffee mug, masterpiece, 8k, hyperdetailed, --no text (tag-soup + unsupported negative-prompt syntax — wastes tokens, mostly ignored)
Gemini¶
Models exposed by the gemini provider. Each model resolves via exact-key lookup against the registry.
| Model ID | Label | Lifecycle |
|---|---|---|
gemini-2.5-flash-image |
Gemini 2.5 Flash Image (Nano Banana) | current |
gemini-3-pro-image-preview |
Gemini 3 Pro Image (preview) | current |
gemini-3.1-flash-image-preview |
Gemini 3.1 Flash Image (preview) | current |
Gemini 2.5 Flash Image (Nano Banana)¶
Best for: Fast, low-latency generation and conversational image editing — multi-turn refinement, multi-image compositing (up to 3 inputs), character consistency across iterations, in-image text, and natural-language local edits ('remove the stain', 'change pose to running'). Strong photorealism with photographic vocabulary (lens, lighting, aspect ratio). Supports 10 aspect ratios from 21:9 cinematic to 9:16 vertical. Cheap (~$0.04/image) — good default for high-volume ideation.
Avoid: Avoid Stable-Diffusion-style comma-separated tag lists — performance drops vs descriptive sentences. No negative-prompt parameter; phrase exclusions positively. Do not rely on transparent backgrounds. All outputs carry an invisible SynthID watermark — unsuitable for workflows requiring unmarked pixels. Not the strongest pick for very dense professional typography. Limit reference inputs to 3 images.
Good prompt: A worn leather-bound journal lying open on a rainy windowsill at dusk. Soft cyan rim-light from outside, warm tungsten lamp on the right. The left page reads, in handwritten script: "Day 42 — still no signal." Shot on 50mm, shallow depth of field. 16:9.
Bad prompt: journal, rainy, moody, cinematic, 8k, masterpiece, --no people (tags + unsupported negative — Google docs explicitly call this the wrong pattern)
Gemini 3 Pro Image (preview)¶
Best for: Higher-fidelity Pro tier with reasoning, suited to demanding production-grade work where 2.5 Flash falls short. Better at dense typography and strict brand compliance. Same prompt grammar as the Flash variants; preview-tier so behaviour can change.
Avoid: Don't use for cheap drafts — cost per image is materially higher than Flash. Same SynthID-watermark caveat. Tag-soup still underperforms. Preview-tier — surface stability not guaranteed.
Good prompt: Magazine cover layout for a quarterly architecture journal: headline 'Concrete Futures' in bold serif, subhead 'Brutalism Reconsidered', central full-bleed photo of a weathered Le Corbusier facade at golden hour. 3:4.
Bad prompt: magazine, architecture, brutalism (single-line keyword set — Gemini Pro shines on richly described prompts; underprompting wastes the cost premium)
Gemini 3.1 Flash Image (preview)¶
Best for: Successor to 2.5 Flash with reasoning ('thinking') support. Good for prompts that benefit from layout reasoning — infographics, structured layouts, multi-element compositions where spatial relationships matter. Same descriptive-prose grammar as 2.5 Flash; same 10 aspect ratios.
Avoid: Avoid tag-soup; same SynthID-watermark caveat as 2.5 Flash. Preview-tier model — schema may shift before GA, surface text may not be perfectly stable. Don't pin production workflows to it without a fallback.
Good prompt: A clean infographic explaining the water cycle on a soft pastel background, four labelled stages arranged in a circle, minimalist line illustration with gentle shadows. 4:3.
Bad prompt: water cycle, infographic, 8k, ultra-detailed (tag style — use descriptive sentences for Gemini)
Placeholder¶
Models exposed by the placeholder provider. Each model resolves via exact-key lookup against the registry.
| Model ID | Label | Lifecycle |
|---|---|---|
placeholder |
Solid-color placeholder | current |
Solid-color placeholder¶
Best for: Returns a deterministic solid-color PNG at the requested aspect ratio. Use for testing pipeline plumbing, mocking generation in unit tests, or zero-cost demos without invoking a real provider.
Avoid: Not a real image generator. Do not use for any task that requires actual image content.
Good prompt: any prompt — placeholder ignores prompt content and emits a solid-color PNG at the requested size
Bad prompt: any prompt where the user actually wants a generated image (use openai, gemini, or sd_webui instead)
SD WebUI¶
SD WebUI checkpoints resolve via the regex-ordered CHECKPOINT_PATTERNS table. First match wins — patterns are ordered specific-before-generic, with an empty-pattern fallback as the final entry to guarantee a non-None match for every checkpoint name.
Pattern catalog (in match order)¶
| # | Pattern | Label |
|---|---|---|
| 1 | flux[._-]?2 |
FLUX.2 (current photorealistic flagship) |
| 2 | flux.*schnell|schnell.*flux |
Flux Schnell (1-4 step distilled) |
| 3 | flux |
Flux 1 dev/pro (photorealistic / highly-detailed) |
| 4 | pony|score_9|autismmix |
Pony Diffusion XL (mandatory score_* tag prefix) |
| 5 | illustrious|noob.?ai |
Illustrious-XL / NoobAI-XL (modern anime SDXL bases) |
| 6 | animagine |
Animagine XL (anime SDXL) |
| 7 | coloring.?book |
Coloring Book (line-art SD1.5) |
| 8 | juggernaut(?!.*illustrious) |
Juggernaut XL (photorealistic SDXL) |
| 9 | dreamshaperxl.*lightning|dreamshaperxl.*alpha |
DreamShaperXL Lightning / Alpha (fast fantasy SDXL) |
| 10 | dreamshaperxl|dreamshaper.*xl |
DreamShaperXL (versatile fantasy SDXL) |
| 11 | dreamshaper |
DreamShaper (versatile SD1.5) |
| 12 | sd3|sd_3|sd3_5|sd3\.5 |
SD 3 / 3.5 (triple-encoder; natural-language) |
| 13 | sd_xl_base|sdxl_base|sdxl-base |
SDXL Base (general-purpose SDXL) |
| 14 | realvisxl|realvis |
RealVisXL (photorealistic SDXL) |
| 15 | v1[-_]5|sd[-_]?1[-._]?5 |
SD 1.5 (general-purpose base) |
| 16 | (default fallback) | Unknown checkpoint (SD general-purpose defaults) |
Pattern: flux[._-]?2¶
FLUX.2 (current photorealistic flagship)¶
Best for: Newest BFL Flux generation. Photorealistic imagery with extreme fine detail; coherent in-scene text; strong architectural and product photography. Natural-language prose prompts; T5 encoder.
Avoid: FLUX.2 does not support negative prompts (CFG=1 distilled). Anime / cel-shaded / low-detail illustration styles fight the model. Don't use SD-style weighted parens or BREAK.
Good prompt: style="cinematic urban photography", medium="digital photograph with shallow DOF"
Bad prompt: style="watercolor wash", medium="hand-painted ink" (FLUX.2 is tuned for photorealism; painterly media will fight the model)
Pattern: flux.*schnell|schnell.*flux¶
Flux Schnell (1-4 step distilled)¶
Best for: Distilled Flux variant for very fast drafts (1-4 steps, CFG=1). Same natural-language prompt style as Flux dev. Best for ideation passes where iteration speed dominates.
Avoid: No negative prompts (CFG=1, fully distilled). Quality below Flux dev / FLUX.2; don't use for final-grade output. Highly detailed textures suffer at 1-4 step counts.
Good prompt: style="cinematic environment concept", medium="painterly digital art, broad strokes" (4 steps)
Bad prompt: style="hyperreal skin pores at 4K", medium="macro photograph" (Schnell sacrifices fine detail for speed)
Pattern: flux¶
Flux 1 dev/pro (photorealistic / highly-detailed)¶
Best for: Photorealistic imagery, extreme fine detail, architectural photography, natural lighting, product shots, documentary portraiture, coherent text in scene. Natural-language prose; T5 encoder; CFG=1 distilled.
Avoid: Negative prompts are unsupported (CFG=1 distilled). Anime / cel-shading / heavy painterly textures fight the model. Don't use SD-style weighted parens or BREAK.
Good prompt: style="cinematic urban photography", medium="digital photograph with shallow DOF"
Bad prompt: style="watercolor wash", medium="hand-painted ink" (Flux is tuned for photorealism; painterly media will fight the model)
Pattern: pony|score_9|autismmix¶
Pony Diffusion XL (mandatory score_* tag prefix)¶
Best for: Highly versatile SDXL fine-tune. Excellent for stylised character art, anime, and varied art styles when prompted with the mandatory leading tag block: 'score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, source_anime, rating_safe' (or source_pony/source_furry, rating_questionable, etc.). Without the score_* prefix, output quality collapses.
Avoid: Bare prompts without the score_* prefix produce visibly degraded results. Photorealistic catalog work — Pony is stylised by design. Natural-language prose underperforms vs Booru-style tag grammar.
Good prompt: score_9, score_8_up, score_7_up, score_6_up, source_anime, rating_safe, 1girl, school uniform, cherry blossoms, soft lighting
Bad prompt: 1girl, anime, cherry blossoms (missing score_* prefix — output collapses)
Pattern: illustrious|noob.?ai¶
Illustrious-XL / NoobAI-XL (modern anime SDXL bases)¶
Best for: Current-generation anime SDXL bases that have largely supplanted Animagine in 2025-26. Danbooru-style tag grammar (artist tags, character tags, e6/Danbooru-style). Much larger character/style dataset than Animagine 3.x. Strong cel-shading and expressive character art.
Avoid: Photorealism — anime-specialised. NoobAI v-prediction variants need the v-prediction sampler config; wrong sampler produces noise. Natural-language prose underperforms vs tag grammar.
Good prompt: 1girl, long hair, blue eyes, school uniform, cherry blossoms, masterpiece, best quality, very aesthetic
Bad prompt: style="documentary photograph", medium="35mm film" (Illustrious/NoobAI are anime-specialised; photographic styles produce off-distribution outputs)
Pattern: animagine¶
Animagine XL (anime SDXL)¶
Best for: Anime illustration base. Danbooru-style tag vocabulary, clean cel shading, expressive character art, vivid saturated palette, manga panel compositions. Animagine 4.x recommends '1girl/1boy, character (series), rating, ..., masterpiece, high score, great score, absurdres'.
Avoid: Photorealism, photography-style lighting, gritty texture, oil painting, detailed backgrounds without anime stylisation. For broader character/style coverage, consider Illustrious-XL or NoobAI-XL.
Good prompt: 1girl, long hair, school uniform, cherry blossoms, masterpiece, high score, absurdres
Bad prompt: style="documentary photograph", medium="35mm film" (Animagine is anime-specialised; photographic styles produce off-distribution outputs)
Pattern: coloring.?book¶
Coloring Book (line-art SD1.5)¶
Best for: Clean outlines on white background, no fill colors, strong linework, simple shapes, children's-book-friendly compositions, decorative borders.
Avoid: Photorealism, color renders, painterly textures, complex shading, dark backgrounds, photographic lighting.
Good prompt: style="bold ink linework", medium="black-and-white outline drawing"
Bad prompt: style="photorealistic portrait", medium="oil paint with rich color" (this checkpoint is fine-tuned for line-art only; color renders will fail)
Pattern: juggernaut(?!.*illustrious)¶
Juggernaut XL (photorealistic SDXL)¶
Best for: Photorealistic portraits, cinematic lighting, sharp textural detail, skin pores, fabric weave, dramatic rim lighting, environmental storytelling. Recent Juggernaut X / XI handle some stylised work too.
Avoid: Anime, cartoon, flat illustration. Watercolor and comic-ink styles are weaker than dedicated stylised checkpoints — usable but not the model's strength.
Good prompt: style="gritty photorealistic urban", medium="digital photo"
Bad prompt: style="watercolor wash", medium="traditional ink" (Juggernaut is tuned for photorealism; stylised media will underperform)
Pattern: dreamshaperxl.*lightning|dreamshaperxl.*alpha¶
DreamShaperXL Lightning / Alpha (fast fantasy SDXL)¶
Best for: Fantasy concept art, painterly illustration, vibrant color, dramatic character portraits. Run at 3-6 steps with CFG ~2 and DPM++ SDE Karras (per Civitai). Fast ideation pass for stylised work.
Avoid: Photorealism (stylised by design), highly detailed textures at very low step counts, strict architectural accuracy.
Good prompt: style="dramatic fantasy concept art", medium="painterly digital illustration"
Bad prompt: style="hyperrealistic skin detail at 4K", medium="macro photograph" (Lightning checkpoints sacrifice fine detail for speed)
Pattern: dreamshaperxl|dreamshaper.*xl¶
DreamShaperXL (versatile fantasy SDXL)¶
Best for: Fantasy illustration, painterly portraits, concept-art style, stylised environments, strong use of negative space.
Avoid: Strict photorealism, clinical document photography, flat-color infographic styles.
Good prompt: style="painterly fantasy illustration", medium="digital concept art"
Bad prompt: style="clinical product photography", medium="catalog studio shot" (DreamShaperXL is stylised by design; strict photo-real fights the model)
Pattern: dreamshaper¶
DreamShaper (versatile SD1.5)¶
Best for: General-purpose stylised illustration, fantasy character art, soft painterly lighting, portrait and environmental compositions; notably versatile — adapt style tags rather than leaning on a single category.
Avoid: Extreme photorealism (slightly stylised by design), Danbooru/anime tag grammar (use natural descriptors).
Good prompt: style="painterly fantasy character portrait", medium="soft digital illustration"
Bad prompt: style="Danbooru anime tags", medium="cel-shading" (DreamShaper SD1.5 expects natural descriptors, not anime tag grammar)
Pattern: sd3|sd_3|sd3_5|sd3\.5¶
SD 3 / 3.5 (triple-encoder; natural-language)¶
Best for: Triple-encoder architecture (CLIP-L + OpenCLIP-bigG + T5-XXL). Benefits from natural-language prose for the T5 stream — same prose-friendly profile as Flux. Supports negative prompts (unlike Flux). 3.5 Large Turbo is 4-step distilled.
Avoid: CLIP tag-soup underperforms vs descriptive prose. Architecturally distinct from SDXL — don't expect SDXL fine-tune behaviour to carry over.
Good prompt: A weathered fishing boat moored at a stone harbour at dawn, gulls circling overhead, soft cool light, painterly yet photoreal, 16:9 cinematic framing.
Bad prompt: fishing boat, harbour, dawn, masterpiece, 8k, ((highly detailed)) (tag-soup with weighted parens — SD3 wants prose, parens are SDXL/SD1.5 syntax)
Pattern: sd_xl_base|sdxl_base|sdxl-base¶
SDXL Base (general-purpose SDXL)¶
Best for: Broad style range, photography, illustration, concept art. Responds well to explicit style tokens. Works at 25-30+ steps for coherence.
Avoid: Anime-specific Danbooru vocabulary without style priming. Very low step counts (needs 25-30+ for coherence). The SDXL refiner is rarely used in 2026 workflows; modern fine-tunes drop it in favour of hires-fix / upscalers.
Good prompt: style="cinematic illustration with explicit style tokens", medium="digital art"
Bad prompt: style="anime without style priming", medium="bare Danbooru tags" (SDXL base needs explicit style direction; bare anime grammar underperforms)
Pattern: realvisxl|realvis¶
RealVisXL (photorealistic SDXL)¶
Best for: Current-generation SDXL photorealism fine-tune. Sharp textural detail, skin/fabric/material fidelity, cinematic lighting. Has eclipsed Juggernaut share in 2026 SDXL photoreal work.
Avoid: Anime, cel-shading, watercolor, comic-ink. Painterly stylisation fights the photorealistic tuning.
Good prompt: style="documentary photorealism", medium="digital photo, sharp focus, natural light"
Bad prompt: style="cel-shaded anime", medium="flat colour" (RealVisXL is photoreal-tuned; stylised media underperforms)
Pattern: v1[-_]5|sd[-_]?1[-._]?5¶
SD 1.5 (general-purpose base)¶
Best for: Broad style range. Native latent at 512px; commonly used at 512x768 / 768x512 before hires-fix. With hires-fix or upscaler chains routinely produces 1024x1536+. Well-supported by community LoRAs.
Avoid: Photorealistic skin detail at high resolution without hires-fix; SDXL-native aspect ratios. Don't expect SDXL-tier coherence at SDXL resolutions without upscaling.
Good prompt: style="watercolor portraiture", medium="ink illustration"
Bad prompt: style="hyperrealistic skin at 1024px", medium="macro studio photograph" (SD 1.5 native latent is 512²; use SDXL or run hires-fix)
Pattern: (default fallback — empty pattern)¶
Unknown checkpoint (SD general-purpose defaults)¶
Best for: Stable Diffusion generally excels at stylised imagery, fantasy environments, and character portraiture. Use explicit style tokens (e.g. 'watercolor painting', 'cinematic photograph') for best results.
Avoid: Coherent embedded text and photographic product catalogs without specialised fine-tuning.
Good prompt: style="painterly fantasy illustration with explicit style tokens", medium="digital concept art"
Bad prompt: style="coherent embedded text", medium="document scan with readable signage" (Stable Diffusion generally cannot render legible text)