Skip to main content
Skip to main content

Vision

Beta feature. Learn more.

Vision lets users upload images for an agent to analyze. The agent passes the image to a vision-capable model, which describes, summarizes, or answers questions about what's in it.

Enable vision capabilities

Vision only works with models that support image inputs. If the selected model can't handle image inputs, the upload control on the message composer is disabled. Switch to a vision-capable model in model parameters to re-enable it.

Use vision capabilities

Click the paperclip icon at the bottom-left of the message composer and choose Upload to Provider to attach an image — a screenshot, a photo, a chart, a diagram. Then ask any question that requires reading the image: "What's wrong with this query plan?", "Transcribe the text in this screenshot," or "Compare this dashboard to last week's."

The agent treats the image as part of the message context, so follow-up questions in the same turn can reference what it saw without re-uploading.

Combine vision with other tools

Vision pairs well with the code interpreter for image-driven analysis — for example, the agent reads numbers off a screenshot and then runs Python to compute totals — and with web search when an image references something the model needs to look up.