What happens if my website doesn’t have a sitemap?
This step-by-step guide explains how CustomGPT.ai handles website crawling when no sitemap is detected, and what options are available if you'd prefer to start indexing from a specific URL.
How it works when no sitemap is found
When you create an agent using a website URL, CustomGPT.ai first attempts to detect and use a sitemap to find all pages efficiently.
If no sitemap is found, the system defaults to recursive crawling starting from the main domain (home page) of the provided URL. You entered the URL of a specific subpage, but the agent started crawling from the home page instead. As a result, it indexed content that wasn’t relevant, or missed the section you intended to focus on.
Example:
- If you enter: https://customgpt.ai/customer-intelligence
- We will begin crawling from: https://customgpt.ai/
- This can lead to the agent indexing unrelated content or missing the section you intended to focus on.
Note:If no links are found during this crawl, CustomGPT.ai will index only the single page you provided.
What if I want page to start from the URL I provided?
We’ve added a new feature that allows you to control where crawling begins. You can now choose to start crawling from the exact URL you entered.
- Go to the New Agent page in your CustomGPT.ai dashboard.

- Choose Website as the data source.

- Enter the URL you want to start crawling from (e.g., a specific subpage).

- Click Not what you expected? below the URL field.

- Select the option Start crawling from the provided URL rather than the home page and click Create Agent to proceed. This will start crawling from the exact page you provided, instead of defaulting to the root domain.

Note:This also applies during auto-sync. If you use this setting, future syncs will continue to crawl from the specified page.
Updated 4 months ago