Skip to main content

Gemini content generation (native format)

Call Gemini multimodal models (e.g. Nano Banana) with native generateContent: text, images, and optional “thinking” traces.

Basics

  • Endpoint: POST /v1beta/models/{model}:generateContent/
  • Path: model — e.g. gemini-3-pro-image-preview.
  • Auth: Bearer Token
  • Content-Type: application/json

Request body

FieldTypeRequiredDescription
contentsarrayYesConversation turns.
└─ rolestringNoe.g. user.
└─ partsarrayYesMessage parts.
└─ └─ textstringYesPrompt / text input.
generationConfigobjectYesGeneration settings.
└─ responseModalitiesarrayYesTEXT, IMAGE, etc.
└─ thinkingConfigobjectNoThinking / chain-of-thought.
└─ └─ includeThoughtsbooleanNoReturn model “thoughts” before the answer.
└─ imageConfigobjectYesImage settings.
└─ └─ aspectRatiostringYese.g. 16:9, 1:1, 4:3.
└─ └─ imageSizestringYese.g. 4K, 1024x1024.

Response

200 OK
FieldTypeDescription
candidatesarrayModel outputs.
└─ contentobjectRole + parts (text and/or image).
└─ finishReasonstringe.g. STOP.
└─ safetyRatingsarraySafety scores.
usageMetadataobjectToken usage (prompt, candidates, total).

Example (image)

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "draw a futuristic city at sunset"
        }
      ]
    }
  ],
  "generationConfig": {
    "responseModalities": ["TEXT", "IMAGE"],
    "imageConfig": {
      "aspectRatio": "16:9",
      "imageSize": "4K"
    },
    "thinkingConfig": {
      "includeThoughts": true
    }
  }
}

Tips

  1. Thoughts: includeThoughts: true surfaces how the model interprets the prompt—useful for debugging complex prompts.
  2. Multimodal: With both TEXT and IMAGE, you often get explanatory text then the image.
  3. Model id: Ensure {model} matches what your key can access.