Verify Responses - Best practices

This guide covers how to use Verify Responses to systematically audit your AI agent, identify different types of inaccuracies, and take action to improve response quality over time.

Why auditing your AI agent matters

Your AI agent answers questions every day. Some responses are accurate. Some are close. And some contain errors you may never hear about.

Most users don't report inaccurate answers. They simply leave. Without a systematic way to check response quality, small problems can persist for weeks or months, eroding user trust without your knowledge.

Verify Responses gives you visibility into how your agent arrives at each answer. It extracts claims, checks them against your source documents, and calculates a verified claims score. This allows you to catch problems proactively and continuously improve your agent.

Case Study: At CustomGPT.ai, we ran Verify Responses on our own support agents. The results helped us spot gaps we didn't know existed. Here's the full case study, followed by a step-by-step methodology you can use.

🚧

Verify Responses uses Claude Sonnet 4.5 to analyze and verify your agent's responses. We're continuously testing models to ensure the best accuracy for this feature. If you have specific requirements or want to use a different high-performance model for verification, contact our sales team to discuss options.


Understanding the three types of inaccuracies

When you audit your agent's responses, you'll find that most inaccuracies fall into one of three categories. Identifying which type you're dealing with helps you apply the right fix.

1. Persona-related inaccuracies

These occur when your agent's persona (custom instructions) is too vague or contains conflicting guidance. The agent may make assumptions, use an inappropriate tone, or answer questions outside its intended scope.

Signs of persona issues:

  • Agent provides answers when it should say "I don't know"
  • Responses have inconsistent tone or style
  • Agent makes assumptions not supported by your content

How to fix: Review and refine your persona settings. Be specific about when the agent should answer directly versus when it should acknowledge limitations. Use positive instructions (e.g., "Always cite your sources") rather than negative ones (e.g., "Don't make things up").

2. System retrieval issues

Sometimes the agent retrieves the wrong content from your knowledge base, or combines information in ways that create inaccuracies. These issues are harder to spot but often affect multiple responses.

Signs of retrieval issues:

  • Verified claims scores are low even though relevant content exists in your knowledge base
  • Agent cites sources that don't fully support its claims
  • Similar questions produce inconsistent quality

How to fix

First, try adjusting your agent's capabilities and model settings:

  • Enable Highest Relevance - This improves how your agent matches questions to the most relevant content in your knowledge base.
  • Enable Complex Reasoning - This helps with multi-step questions where the agent needs to synthesize information from multiple sources.
  • Try a different model — Some models handle certain content types better than others. Consider testing with Claude Sonnet 4.5 or ChatGPT 4.1 to see if accuracy improves.

You can find these options in your agent's Intelligence settings.

If you've tried these adjustments and still notice patterns in retrieval issues, contact our support team. We can help diagnose whether content formatting, metadata, or other factors are affecting retrieval quality.

3. Documentation gaps

This is often the most valuable discovery. Your agent may be technically doing its job- retrieving the most relevant content available- but your knowledge base simply doesn't contain the information users need.

Signs of documentation gaps:

  • Low verified claims scores on specific topics
  • Agent pieces together answers from loosely related content
  • Users repeatedly ask questions your docs don't cover

How to fix: Create new content to fill the gaps. Every low-scoring response on a frequently asked topic is a signal that your knowledge base needs expansion.

What we found: When auditing our own agents, we discovered users were asking about Zapier integrations for email ingestion. Our agent gave detailed responses, but Verify Responses flagged accuracy issues. The problem? We had never written documentation for these workflows. We created two new articles: Upload a File to the Agent Using Zapier and Automatically Sync Gmail Emails to Your Agent's Knowledge Base. Now these questions get verified, accurate answers.


Step-by-step audit process

Follow this process to systematically audit and improve your AI agent using Verify Responses.

Step 1: Enable Verify Responses in testing mode

Go to your agent's Agentic Actions settings and enable Verify Responses. When enabled in testing mode, verification runs automatically on every conversation, giving you immediate visibility into verified claims.

Step 2: Collect real conversations

Let your agent handle real questions, either from actual users or from team members testing realistic scenarios. Avoid only testing questions you already know your docs answer well. The goal is to see what users actually experience.

Step 3: Review accuracy scores in Customer Intelligence

Use the Customer Intelligence dashboard to filter conversations by accuracy score. Focus on responses that fall below your acceptable threshold. These are your starting points for investigation.

Step 4: Categorize each issue

For each low-scoring response, determine which category it falls into:

Issue TypeQuestion to AskTypical Fix
PersonaIs the agent behaving outside its intended role? DocumentationRefine persona settings
RetrievalDoes relevant content exist but wasn't used correctly?Enable Highest Relevance or Complex Reasoning, try Claude Sonnet 4.5 or ChatGPT 4.1, or contact support
DocumentationIs the content simply missing from your knowledge base?Create new documentation

Step 5: Implement fixes and retest

Make changes based on your findings. Then run the same questions again and compare verified claims scores. This confirms your fix worked.

Step 6: Switch to on-demand mode for production

Once you've addressed initial issues, you can switch Verify Responses to on-demand mode for production use. This lets you spot-check specific conversations without running verification on every query.

Tip: Keep Verify Responses enabled continuously if you need governance and full auditability for compliance purposes. Claims data will appear in your Customer Intelligence analytics.


Using accuracy data for ongoing improvement

Verify Responses isn't just for one-time audits. Use it as an ongoing quality management tool.

Monitor trends over time

Check your Customer Intelligence dashboard regularly. Are verified claims scores improving? Are certain topics consistently problematic? Trends tell you whether your improvements are working.

Build a feedback loop

Every inaccurate response is a signal:

  • Low verified claims + persona issue → Refine your custom instructions
  • Low verified claims + retrieval problem → Investigate content structure or contact support
  • Low verified claims + missing content → Add new documentation

This feedback loop turns user questions into continuous improvement.

Prioritize high-impact fixes

Focus first on inaccuracies that appear frequently or affect critical user journeys. A single fix to a common question can improve hundreds of future interactions.


Best practices summary

  • Audit proactively: Don't wait for user complaints. Use Verify Responses to find issues before they erode trust.
  • Categorize issues: Understanding whether a problem is persona, retrieval, or documentation helps you apply the right fix.
  • Let users guide content creation: Low verified claims scores reveal what your knowledge base is missing. Use them to prioritize new documentation.
  • Retest after fixes: Always verify that your changes improved verified claims scores.
  • Use Customer Intelligence filters: Sort by accuracy score to quickly find conversations that need attention.
  • Consider continuous verification for compliance: If you need audit trails for governance, keep Verify Responses enabled in production.

Related Articles