AI Crawler Directory
18 crawlers from 12 companies — last updated May 2026
Every major AI company operates web crawlers that visit your site. Some collect training data
for foundation models. Others power search features or respond to user requests.
Understanding which is which lets you make informed robots.txt decisions —
blocking training without losing visibility in AI-powered search.
QUICK LINKS
CATEGORIES
| USER-AGENT | COMPANY | TYPE | ROBOTS.TXT |
|---|---|---|---|
| GPTBot | OpenAI | AI Training | ✓ Respects |
| ChatGPT-User | OpenAI | User-Triggered Fetch | ✓ Respects |
| OAI-SearchBot | OpenAI | AI Search Index | ✓ Respects |
| ClaudeBot | Anthropic | AI Training | ✓ Respects |
| Claude-User | Anthropic | User-Triggered Fetch | ✓ Respects |
| Claude-SearchBot | Anthropic | AI Search Index | ✓ Respects |
| anthropic-ai | Anthropic | Legacy / Deprecated | ✓ Respects |
| Google-Extended | AI Training | ✓ Respects | |
| Applebot-Extended | Apple | AI Training | ✓ Respects |
| Meta-ExternalAgent | Meta | AI Training | ⚠ Partial |
| FacebookBot | Meta | AI Feature Indexing | ✓ Respects |
| PerplexityBot | Perplexity AI | AI Search Index | ✓ Respects |
| CCBot | Common Crawl | Open Dataset | ✓ Respects |
| Bytespider | ByteDance | AI Training | ⚠ Partial |
| Amazonbot | Amazon | AI Feature Indexing | ✓ Respects |
| Diffbot | Diffbot | Open Dataset | ✓ Respects |
| DeepSeekBot | DeepSeek | AI Training | ✓ Respects |
| cohere-ai | Cohere | AI Training | ✓ Respects |
OpenAI
Anthropic
Apple
Meta
Perplexity AI
Common Crawl
ByteDance
Amazon
Diffbot
DeepSeek
Cohere
Understanding AI Web Crawlers
AI web crawlers are automated programs that visit websites to collect content. Unlike traditional search engine crawlers (Googlebot, Bingbot) that build search indexes, AI crawlers serve several distinct purposes:
Training Crawlers
These crawlers collect web content to build training datasets for foundation models. GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Google), and Meta-ExternalAgent (Meta) all fall into this category. Blocking them prevents your content from being included in future training runs but does not retroactively remove already-trained data.
Search and Retrieval Crawlers
OAI-SearchBot, Claude-SearchBot, and PerplexityBot index content so it can appear as cited sources in AI-powered search products. Blocking these crawlers removes your site from those products' search results — a significant trade-off if AI search drives traffic to your site.
User-Triggered Fetchers
ChatGPT-User and Claude-User only activate when a human asks the AI assistant to read a specific URL. They are not autonomous crawlers — they fetch one page at a time in response to user requests. Blocking them prevents the AI from citing your content when users explicitly request it.
The Recommended Approach
Most publishers who want to opt out of AI training while staying visible in AI search use this pattern:
# Block training crawlers User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Meta-ExternalAgent Disallow: / User-agent: CCBot Disallow: / # Allow search + user-triggered fetchers (keep visibility) # OAI-SearchBot, ChatGPT-User, Claude-User, # Claude-SearchBot, PerplexityBot — leave unblocked
For a ready-to-use template, try our Block AI Crawlers guide
or generate a custom robots.txt with the Robots.txt Generator.
AI Crawler FAQ
How many AI crawlers are there in 2026?
The ai-robots-txt community project tracks over 100 user-agent strings associated with AI operations. This directory covers the 18 most impactful crawlers from major AI companies that website operators should know about.
Does blocking AI crawlers affect my Google Search rankings?
No. AI crawlers use separate user-agent strings from search engine crawlers. Blocking GPTBot, ClaudeBot, or Google-Extended does not affect Googlebot, Bingbot, or any search index. Google Search uses Googlebot exclusively for ranking.
Can I block AI training but stay visible in ChatGPT and Claude?
Yes. Block the training crawlers (GPTBot, ClaudeBot) while leaving the search and user-triggered crawlers (OAI-SearchBot, ChatGPT-User, Claude-User, Claude-SearchBot) allowed. Each operates independently.