How AI Vision works
This guide explains how AI Vision processes uploaded images to extract meaningful information and add it to your agentโs knowledge base.
What AI Vision does
AI Vision uses advanced visual recognition technology to understand images. It identifies what appears in an image, creates a short natural-language description, and stores that description in your agentโs knowledge base.
Once processed, your agent can use this visual information just like any other data sourceโhelping it provide richer and more accurate responses.

What AI Vision can interpret
AI Vision can effectively process:
- Technical diagrams and schematics
- Charts and graphs
- Photographs
- Illustrations
- Handwritten text
- Screenshots
- Product images
- Any other image type
Note:AI Vision can also perform OCR (Optical Character Recognition) to extract text from images, with best results in English.
How the process works
- Image upload: You upload one or more images through the file upload modal or when creating a new agent.
- Image processing: The images are processed using OpenAIโs advanced vision models, which analyze the visual content in detail.
- AI analysis and description: The system intelligently interprets whatโs in the imageโsuch as objects, text, or diagramsโand generates a clear, human-readable summary.
- Knowledge base integration: The generated description is automatically added to your agentโs knowledge base, where it functions as a regular content source.
- Response reference and citations: When your agent uses this information in a response, it can automatically include the image as a citation for context and transparency.
Note:Image citations are automatically enabled for Premium and Enterprise users who use AI Vision on their agent's knowledge base.
Related articles:
- Enable AI Vision for uploaded images
- Vision processing limits
- Embed your photos in agent responses
- Activate image citations
Updated about 4 hours ago