Gemini Provider¶
Image generation via Google's Gemini native generateContent API with responseModalities=["IMAGE"]. Good for general-purpose generation with a generous free tier.
Setup¶
Set your Google API key:
bash
IMAGE_GENERATION_MCP_GOOGLE_API_KEY=AIza...
The provider registers automatically when this variable is set. Get a key at Google AI Studio.
Supported models¶
| Model | Notes |
|---|---|
gemini-2.5-flash-image |
Default — stable GA release |
gemini-3.1-flash-image-preview |
Preview — successor to 2.5 flash; faster |
gemini-3-pro-image-preview |
Preview — highest quality; best for complex scenes |
Use list_providers to see which models are available on your API key.
Aspect ratios and sizes¶
Gemini natively supports 14 aspect ratios — all are passed through directly:
| Aspect ratio | Notes |
|---|---|
1:1 |
Square (default) |
16:9 |
Landscape |
9:16 |
Portrait |
3:2 |
Photo landscape |
2:3 |
Photo portrait |
3:4 |
Portrait |
4:3 |
Landscape |
4:5 |
Portrait |
5:4 |
Landscape |
4:1 |
Ultra-wide banner |
1:4 |
Ultra-tall banner |
8:1 |
Extreme panorama |
1:8 |
Extreme vertical |
21:9 |
Cinematic ultra-wide |
Quality levels¶
The quality parameter controls resolution, model reasoning, and response modalities:
| Quality | Image size | Thinking | Response modalities | Cost |
|---|---|---|---|---|
standard |
1K | Minimal (default) | Image only | Free tier |
hd |
2K | High (model reasons about composition before rendering) | Text + Image | Thinking tokens billed |
How hd works: When quality="hd" is set, the provider enables thinking_level="High" on supported models (gemini-3.1-flash-image-preview, gemini-3-pro-image-preview). The model reasons through the prompt, plans composition, and may generate interim images before producing the final result. This significantly improves output quality for complex prompts with multiple elements, layouts, or text.
Note: gemini-2.5-flash-image does not support thinking — hd still increases resolution to 2K but skips the thinking configuration for this model.
Negative prompts¶
Gemini does not have native negative prompt support. When a negative_prompt is provided, it is appended to the prompt as:
```
Avoid: {negative_prompt} ```
Background transparency¶
Not supported. The background parameter is silently ignored — all images are generated with an opaque background.
Prompt style¶
Gemini works best with natural language descriptions:
A professional product photo of white sneakers on a clean white background,
studio lighting, sharp focus, commercial photography style
Avoid CLIP-style tag lists (those work better with Stable Diffusion).
Per-call model selection¶
The model parameter on generate_image overrides the provider's default model for a single request:
generate_image(prompt="...", provider="gemini", model="gemini-2.5-flash-image")
Use list_providers to discover available models and their capabilities.
Capability discovery¶
At startup, the provider returns a static list of known image-capable Gemini models. Unlike OpenAI, the Gemini models.list() API does not reliably filter to image-generation models, so the known model list is maintained in the provider code.
Cost¶
Gemini has a generous free tier (check Google AI pricing for current limits). The provider is not in paid_providers by default — no confirmation prompt is shown before use. Set IMAGE_GENERATION_MCP_PAID_PROVIDERS=gemini,openai if you want cost confirmation for Gemini.
SynthID watermark¶
All Gemini-generated images include an invisible SynthID watermark added by Google. This is automatic and cannot be disabled.
Error handling¶
| Error | Cause | Resolution |
|---|---|---|
| Content policy rejection | Prompt violates Gemini safety policy | Modify the prompt to comply with Google's usage policies |
| Connection error | Cannot reach Gemini API | Check network connectivity and API key validity |
| No image in response | Model returned text instead of image | Try rephrasing the prompt or use a different model |
| API error (HTTP 429) | Rate limited | Wait and retry; consider reducing request frequency |