GPTBot — OpenAI's Training Crawler

What is GPTBot? OpenAI's web crawler that collects training data for GPT models. Learn how to block GPTBot in robots.txt without losing ChatGPT Search visibility.

QUICK FACTS

USER-AGENT GPTBot
OPERATOR OpenAI
CATEGORY AI Training
FIRST SEEN 2023-08
ROBOTS.TXT ✓ Respects directives
DOCUMENTATION Official docs →

What is GPTBot?

GPTBot is OpenAI's primary web crawler for gathering training data used to build and improve GPT foundation models. It fetches publicly accessible web pages and filters out content that violates OpenAI's usage policies, sources requiring paywalls, and personally identifiable information. Blocking GPTBot does not affect your visibility in ChatGPT Search — that is handled by OAI-SearchBot.

How to Block GPTBot

Add the following to your robots.txt file (located at the root of your website):

User-agent: GPTBot
Disallow: /

What Happens When You Block GPTBot

Your future content will not be included in GPT training datasets. Existing trained data is not retroactively removed. ChatGPT Search citations and live browsing are NOT affected.

Should You Block GPTBot?

GPTBot is a training crawler — it collects data to build AI models. If you want to prevent your content from being used in future AI training by OpenAI, block it. This is a one-way decision: blocking today only affects future crawls, not data already collected.

GPTBot vs Other OpenAI Crawlers

OpenAI operates multiple crawlers, each serving a different purpose:

User-agent Purpose Type
GPTBot Collects training data for GPT models AI Training
ChatGPT-User Live page fetches triggered by ChatGPT users User-Triggered Fetch
OAI-SearchBot Indexes content for ChatGPT Search citations AI Search Index

Each crawler operates independently. Blocking GPTBot does not block ChatGPT-User or OAI-SearchBot — you must add a separate rule for each.

GENERATE YOUR ROBOTS.TXT

Use our visual generator to create a robots.txt file that blocks GPTBot and any other crawlers you want to opt out of.