How AI Vision works

This guide explains how AI Vision processes uploaded images to extract meaningful information and add it to your agentโ€™s knowledge base.

What AI Vision does

AI Vision uses advanced visual recognition technology to understand images. It identifies what appears in an image, creates a short natural-language description, and stores that description in your agentโ€™s knowledge base.

Once processed, your agent can use this visual information just like any other data sourceโ€”helping it provide richer and more accurate responses.


What AI Vision can interpret

AI Vision can effectively process:

  • Technical diagrams and schematics
  • Charts and graphs
  • Photographs
  • Illustrations
  • Handwritten text
  • Screenshots
  • Product images
  • Any other image type
๐Ÿšง

Note:

AI Vision can also perform OCR (Optical Character Recognition) to extract text from images, with best results in English.


How the process works

  1. Image upload: You upload one or more images through the file upload modal or when creating a new agent.
  2. Image processing: The images are processed using OpenAIโ€™s advanced vision models, which analyze the visual content in detail.
  3. AI analysis and description: The system intelligently interprets whatโ€™s in the imageโ€”such as objects, text, or diagramsโ€”and generates a clear, human-readable summary.
  4. Knowledge base integration: The generated description is automatically added to your agentโ€™s knowledge base, where it functions as a regular content source.
  5. Response reference and citations: When your agent uses this information in a response, it can automatically include the image as a citation for context and transparency.
๐Ÿ“˜

Note:

Image citations are automatically enabled for Premium and Enterprise users who use AI Vision on their agent's knowledge base.


Related articles: