AI Crawler Directory

18 crawlers from 12 companies — last updated May 2026

Every major AI company operates web crawlers that visit your site. Some collect training data for foundation models. Others power search features or respond to user requests. Understanding which is which lets you make informed robots.txt decisions — blocking training without losing visibility in AI-powered search.

QUICK LINKS

CATEGORIES

AI Training AI Search Index User-Triggered Fetch AI Feature Indexing Open Dataset Legacy / Deprecated
USER-AGENT COMPANY TYPE ROBOTS.TXT
GPTBot OpenAI AI Training ✓ Respects
ChatGPT-User OpenAI User-Triggered Fetch ✓ Respects
OAI-SearchBot OpenAI AI Search Index ✓ Respects
ClaudeBot Anthropic AI Training ✓ Respects
Claude-User Anthropic User-Triggered Fetch ✓ Respects
Claude-SearchBot Anthropic AI Search Index ✓ Respects
anthropic-ai Anthropic Legacy / Deprecated ✓ Respects
Google-Extended Google AI Training ✓ Respects
Applebot-Extended Apple AI Training ✓ Respects
Meta-ExternalAgent Meta AI Training ⚠ Partial
FacebookBot Meta AI Feature Indexing ✓ Respects
PerplexityBot Perplexity AI AI Search Index ✓ Respects
CCBot Common Crawl Open Dataset ✓ Respects
Bytespider ByteDance AI Training ⚠ Partial
Amazonbot Amazon AI Feature Indexing ✓ Respects
Diffbot Diffbot Open Dataset ✓ Respects
DeepSeekBot DeepSeek AI Training ✓ Respects
cohere-ai Cohere AI Training ✓ Respects

OpenAI

Anthropic

Google

Apple

Meta

Perplexity AI

Common Crawl

ByteDance

Amazon

Diffbot

DeepSeek

Cohere

Understanding AI Web Crawlers

AI web crawlers are automated programs that visit websites to collect content. Unlike traditional search engine crawlers (Googlebot, Bingbot) that build search indexes, AI crawlers serve several distinct purposes:

Training Crawlers

These crawlers collect web content to build training datasets for foundation models. GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google), and Meta-ExternalAgent (Meta) all fall into this category. Blocking them prevents your content from being included in future training runs but does not retroactively remove already-trained data.

Search and Retrieval Crawlers

OAI-SearchBot, Claude-SearchBot, and PerplexityBot index content so it can appear as cited sources in AI-powered search products. Blocking these crawlers removes your site from those products' search results — a significant trade-off if AI search drives traffic to your site.

User-Triggered Fetchers

ChatGPT-User and Claude-User only activate when a human asks the AI assistant to read a specific URL. They are not autonomous crawlers — they fetch one page at a time in response to user requests. Blocking them prevents the AI from citing your content when users explicitly request it.

The Recommended Approach

Most publishers who want to opt out of AI training while staying visible in AI search use this pattern:

# Block training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: CCBot
Disallow: /

# Allow search + user-triggered fetchers (keep visibility)
# OAI-SearchBot, ChatGPT-User, Claude-User,
# Claude-SearchBot, PerplexityBot — leave unblocked

For a ready-to-use template, try our Block AI Crawlers guide or generate a custom robots.txt with the Robots.txt Generator.

AI Crawler FAQ

How many AI crawlers are there in 2026?

The ai-robots-txt community project tracks over 100 user-agent strings associated with AI operations. This directory covers the 18 most impactful crawlers from major AI companies that website operators should know about.

Does blocking AI crawlers affect my Google Search rankings?

No. AI crawlers use separate user-agent strings from search engine crawlers. Blocking GPTBot, ClaudeBot, or Google-Extended does not affect Googlebot, Bingbot, or any search index. Google Search uses Googlebot exclusively for ranking.

Can I block AI training but stay visible in ChatGPT and Claude?

Yes. Block the training crawlers (GPTBot, ClaudeBot) while leaving the search and user-triggered crawlers (OAI-SearchBot, ChatGPT-User, Claude-User, Claude-SearchBot) allowed. Each operates independently.