Is Your Website Blocking AI Crawlers? Check Before 15 Sept

TL;DR

Cloudflare confirmed on 1 July 2026 that from 15 September it will block AI training and agent crawlers by default on ad-carrying pages for newly onboarded domains, and TechCrunch reports the change also covers free-plan customers. If AI crawlers cannot read your site, ChatGPT, Claude, Perplexity and Gemini describe your business from third-party sources or not at all. A five-minute check of robots.txt and your Cloudflare dashboard shows where you stand.

On 1 July 2026, Cloudflare announced its biggest change yet to how websites handle AI crawlers. New defaults arrive on 15 September. Some UK business sites will be affected directly. Many more are already blocking AI crawlers without their owners knowing, often because of a setting a developer chose years ago. Here is what is changing, why it matters, and how to check your own site in five minutes.

What is changing at Cloudflare on 15 September 2026

On 15 September 2026, Cloudflare will change its default settings so that AI crawlers used for training and agent activity are blocked on pages that carry advertising, unless the site owner chooses otherwise. Cloudflare confirmed the change on 1 July 2026. Search crawlers remain allowed by default.

Here is what Cloudflare itself has confirmed. Sites on Cloudflare can now manage AI traffic in three categories: Search, which indexes content for search results; Agent, which fetches pages in real time on behalf of a user; and Training, which collects content to train or fine-tune AI models (Cloudflare, 1 July 2026). From 15 September, all new domains joining Cloudflare will have Training and Agent blocked by default on pages that display ads, with Search still allowed. Multi-purpose crawlers that mix these jobs, and Cloudflare names Googlebot, Applebot and Bingbot as examples, will be subject to the most restrictive rule a site applies (Cloudflare, 1 July 2026).

Separately, TechCrunch reports that the new defaults will also apply to new sites set up by existing customers and to all free-plan customers, who can opt out before the deadline (TechCrunch, 1 July 2026). Many UK small-business sites sit on Cloudflare's free plan, so that reported detail matters. There is also older context: Cloudflare has blocked AI crawlers by default for newly signed-up domains since 1 July 2025 (Cloudflare, 1 July 2025). If your site moved to Cloudflare in the past year, AI crawlers may already be shut out.

Why being crawlable matters for AI visibility

AI assistants can only describe your business accurately if their crawlers can read your website. If GPTBot, ClaudeBot, PerplexityBot and Google-Extended are blocked, tools such as ChatGPT, Claude, Perplexity and Gemini rely on third-party sources, old data or nothing at all. Crawler access is the foundation of AI visibility.

More people now ask AI assistants for recommendations before they ever reach Google. When those assistants cannot fetch your pages, they fall back on directories, review sites and whatever was written about you elsewhere. That picture may be out of date or simply wrong. Crawler access does not get you recommended on its own, but without it the rest of your AI search optimisation work has nothing to stand on.

The main AI crawlers and what each one does

The AI crawlers that matter most to UK businesses are GPTBot and OAI-SearchBot from OpenAI, ClaudeBot and Claude-SearchBot from Anthropic, PerplexityBot from Perplexity, and the Google-Extended control used by Google. Each serves a different purpose, and each can be allowed or blocked separately in your robots.txt file.

GPTBot collects content that may be used to train OpenAI's models (OpenAI, accessed July 2026).
OAI-SearchBot indexes pages so ChatGPT can link to and cite websites in its search answers. Blocking it makes you harder for ChatGPT search to surface (OpenAI, accessed July 2026).
ChatGPT-User fetches a live page when a user asks ChatGPT about it (OpenAI, accessed July 2026).
ClaudeBot collects content that may contribute to training Anthropic's models. Claude-SearchBot crawls to improve Claude's search results, and Claude-User fetches pages when users ask. All three respect robots.txt and are controlled independently (Anthropic, accessed July 2026).
PerplexityBot indexes content for Perplexity's search results. Perplexity states it is not used to train foundation models. Perplexity-User visits pages when a user asks a question (Perplexity, accessed July 2026).
Google-Extended is a robots.txt control rather than a separate bot. It governs whether content Googlebot crawls can be used for Gemini training and for grounding. Google states it does not affect a site's inclusion or ranking in Google Search (Google Search Central, accessed July 2026).

One caution. Googlebot and Bingbot also power ordinary search. Never block those two unless you fully understand the consequences.

The five-minute self-check

Checking whether your site blocks AI crawlers takes about five minutes. Look at your robots.txt file for blocked bot names, review your Cloudflare dashboard settings if you use Cloudflare, and ask your developer one direct question. No technical skill is needed to spot the most common problems.

Step 1: read your robots.txt

Type yourdomain.com/robots.txt into your browser. Use Ctrl+F to search for GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot and Google-Extended. A bot name followed by Disallow: / means that bot is blocked from the whole site. User-agent: * followed by Disallow: / blocks everything. If a bot is not mentioned at all, it is allowed by default.

Step 2: check your Cloudflare dashboard

Log in, select your domain and open AI Crawl Control. The Crawlers tab lists each AI crawler with an allow or block action (Cloudflare Docs, accessed July 2026). Also review your AI bot policy under the security settings, where the Search, Agent and Training presets live, and check whether Cloudflare's managed robots.txt is adding block rules on your behalf. Do this before 15 September if you are on the free plan.

Step 3: ask your developer

Copy and send this: "Can GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot and PerplexityBot fetch our pages, or are they blocked in robots.txt, a firewall or Cloudflare?" This matters because network-level blocks leave no trace in robots.txt. Your robots.txt can look clean while Cloudflare or a security plugin turns crawlers away.

When blocking AI crawlers is the right call

Blocking AI crawlers is a legitimate choice for some businesses. Publishers who sell content, membership sites, and firms with genuine scraping concerns may be better off blocked, or better off charging for access through tools such as Cloudflare's marketplace. The right answer depends on how your business earns money.

If your words are the product, giving them to AI models without payment is a real cost. That is the problem Cloudflare says it is addressing with Pay Per Use, a marketplace where publishers can charge AI companies when their content contributes to an answer, with Ceramic.ai and You.com as early partners (Cloudflare, 1 July 2026). A middle path also exists: block the training bots but allow the search and user-triggered ones. For most UK service businesses, though, being found and described accurately is worth more than the training value of their pages.

How to unblock AI crawlers in robots.txt

To unblock an AI crawler, remove the Disallow line that names it from your robots.txt file, or add an explicit Allow rule for that bot. If Cloudflare is blocking the bot at network level, you must also change the setting in your Cloudflare dashboard. Both layers need to permit access.

These lines allow the main AI crawlers site-wide:

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Only edit the AI bot lines. Leave rules protecting admin areas alone. If you want search visibility without training, allow OAI-SearchBot and Claude-SearchBot but keep GPTBot and ClaudeBot disallowed. In Cloudflare, set each crawler to Allow in AI Crawl Control and confirm your AI bot policy matches. On WordPress, robots.txt is often controlled by an SEO plugin, so change it there rather than uploading a file.

What to expect after unblocking

Unblocking AI crawlers does not produce instant results. Crawlers revisit sites on their own schedules, so it can take days or weeks before AI assistants reflect your content. Access is necessary but not sufficient. What AI tools then say about you depends on your content and your wider footprint online.

The crawlers do return often. Cloudflare says more than half of AI crawler traffic is re-fetching pages that have not changed (TechCrunch, 1 July 2026). User-triggered fetchers such as ChatGPT-User and Perplexity-User read pages live, so those improve as soon as access opens. From there, whether AI assistants actually recommend you comes down to clear service pages, evidence of expertise and consistent mentions elsewhere. Our guide to getting recommended by ChatGPT in the UK covers that next stage.

Questions people ask

These are the questions UK business owners ask us most often about AI crawler access. The short version: checking is quick, allowing AI crawlers does not affect your Google rankings, and Cloudflare has blocked AI crawlers by default for new domains for the past year, so newer sites should check first.

How do I check if my website is blocking ChatGPT?

Visit yourdomain.com/robots.txt in a browser and search the page for GPTBot and OAI-SearchBot. If either name appears above a Disallow: / line, ChatGPT's crawlers are blocked. If you use Cloudflare, also check the AI Crawl Control section of your dashboard, because Cloudflare can block bots without any sign in robots.txt.

Will allowing AI crawlers affect my Google rankings?

No. AI crawler permissions are separate from normal search crawling. Google states that its Google-Extended control does not affect a site's inclusion or ranking in Google Search, and allowing bots such as GPTBot or ClaudeBot has no bearing on how Googlebot indexes your site.

Does Cloudflare block AI crawlers by default?

For new domains, yes. Cloudflare has blocked AI crawlers by default for newly signed-up sites since July 2025. From 15 September 2026 its defaults change again: crawlers classed as training or agent traffic will be blocked on pages that display ads, while search crawlers remain allowed. Existing customers can adjust settings in the dashboard.

Sources

Not sure what your site is telling AI crawlers?

Crawler access is one of the first things our fixed-price audit checks, alongside everything AI already says about you.

Get started

Is Your Website Blocking AI Crawlers? Check Before 15 September