Gemini Provider¶
Image generation via Google's Gemini native generateContent API with responseModalities=["IMAGE"]. Good for general-purpose generation with a generous free tier.
Setup¶
Set your Google API key:
The provider registers automatically when this variable is set. Get a key at Google AI Studio.
Supported models¶
| Model | Notes |
|---|---|
gemini-2.5-flash-image |
Default — stable GA release |
gemini-3.1-flash-image-preview |
Preview — successor to 2.5 flash; faster |
gemini-3-pro-image-preview |
Preview — highest quality; best for complex scenes |
Use list_providers to see which models are available on your API key.
Aspect ratios and sizes¶
Gemini natively supports 14 aspect ratios — all are passed through directly:
| Aspect ratio | Notes |
|---|---|
1:1 |
Square (default) |
16:9 |
Landscape |
9:16 |
Portrait |
3:2 |
Photo landscape |
2:3 |
Photo portrait |
3:4 |
Portrait |
4:3 |
Landscape |
4:5 |
Portrait |
5:4 |
Landscape |
4:1 |
Ultra-wide banner |
1:4 |
Ultra-tall banner |
8:1 |
Extreme panorama |
1:8 |
Extreme vertical |
21:9 |
Cinematic ultra-wide |
Quality levels¶
The quality parameter controls resolution, model reasoning, and response modalities:
| Quality | Image size | Thinking | Response modalities | Cost |
|---|---|---|---|---|
standard |
1K | Minimal (default) | Image only | Free tier |
hd |
2K | High (model reasons about composition before rendering) | Text + Image | Thinking tokens billed |
How hd works: When quality="hd" is set, the provider enables thinking_level="High" on supported models (gemini-3.1-flash-image-preview, gemini-3-pro-image-preview). The model reasons through the prompt, plans composition, and may generate interim images before producing the final result. This significantly improves output quality for complex prompts with multiple elements, layouts, or text.
Note: gemini-2.5-flash-image does not support thinking — hd still increases resolution to 2K but skips the thinking configuration for this model.
Negative prompts¶
Gemini does not have native negative prompt support. When a negative_prompt is provided, it is appended to the prompt as:
Background transparency¶
Not supported. The background parameter is silently ignored — all images are generated with an opaque background.
SynthID watermark¶
All outputs from the Gemini Image family (Flash + Pro tiers) carry an invisible Google SynthID watermark — a per-pixel signal embedded at generation time that survives common edits (re-encoding, cropping, light filtering) and identifies the image as AI-generated. The watermark is announced in this provider's capability surface as watermark: "synthid" on every model entry returned by list_providers.
This means Gemini outputs are not suitable for workflows requiring bit-perfect originals — for example, forensic chain of custody, certain regulatory contexts, or pipelines that hash the raw bytes for content addressing. Pick OpenAI or SD WebUI for those use cases — neither family currently embeds a persistent watermark in output bytes.
The watermark is invisible to the human eye and does not affect image quality or aesthetics; it only matters when the binary integrity of the generated bytes is part of the workflow contract.
Prompt style¶
Gemini works best with natural language descriptions:
A professional product photo of white sneakers on a clean white background,
studio lighting, sharp focus, commercial photography style
Avoid CLIP-style tag lists (those work better with Stable Diffusion).
Per-call model selection¶
The model parameter on generate_image overrides the provider's default model for a single request:
Use list_providers to discover available models and their capabilities.
Capability discovery¶
At startup, the provider returns a static list of known image-capable Gemini models. Unlike OpenAI, the Gemini models.list() API does not reliably filter to image-generation models, so the known model list is maintained in the provider code.
Cost¶
Gemini has a generous free tier (check Google AI pricing for current limits). The provider is not in paid_providers by default — no confirmation prompt is shown before use. Set IMAGE_GENERATION_MCP_PAID_PROVIDERS=gemini,openai if you want cost confirmation for Gemini.
Error handling¶
| Error | Cause | Resolution |
|---|---|---|
| Content policy rejection | Prompt violates Gemini safety policy | Modify the prompt to comply with Google's usage policies |
| Connection error | Cannot reach Gemini API | Check network connectivity and API key validity |
| No image in response | Model returned text instead of image | Try rephrasing the prompt or use a different model |
| API error (HTTP 429) | Rate limited | Wait and retry; consider reducing request frequency |