Skip to main content
Skip to main content

Image generation

Beta feature. Learn more.

Image generation lets an agent produce new images from a text prompt or edit images the user has uploaded. The agent picks between generation and editing based on what was asked and the available context.

Enable image generation

Image generation is added through the Add Tools modal in the Agent Builder (not the Capabilities section). Click Add Tools at the bottom of the Agent Builder panel, then add one of the image-model tools — for example OpenAI Image Tools, DALL-E-3, or Stable Diffusion. The agent picks the appropriate one based on the request, or you can restrict it in instructions.

Generation

When the user asks for an image, the agent calls the generation tool with a prompt and returns the resulting image inline. The agent retains a reference to the image in its context so it can describe or reuse it within the same conversation.

Editing

If the user uploads an image and asks for a modification — change a color, add an object, extend a composition — the agent invokes the editing variant of the tool. The output replaces the relevant region or extends the source as requested.

Notes

  • Generated images aren't fed into separate vision analysis automatically. If you need the agent to interpret an image, use vision with a user-uploaded image.
  • Provider content policies apply. Prompts that violate the provider's policy return an error rather than an image.