Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain text file placed at the root of a website (e.g., example.com/robots.txt) that tells search engine crawlers and other bots which pages or sections they are allowed or not allowed to access. It follows the Robots Exclusion Protocol standard. While well-behaved bots respect robots.txt, it is not a security mechanism — it is a suggestion, not an enforcement.

Question 2

How do I block AI crawlers like GPTBot and CCBot?

Accepted Answer

Add specific User-agent rules with "Disallow: /" for each AI crawler. Common AI bots to block: GPTBot (OpenAI), ChatGPT-User, Google-Extended (Google AI training), CCBot (Common Crawl), anthropic-ai, ClaudeBot (Anthropic), and Bytespider (ByteDance). Use the "Block AI Crawlers" preset in our generator for a quick setup.

Question 3

What is the difference between Allow and Disallow in robots.txt?

Accepted Answer

Disallow tells bots not to crawl a specific path. Allow overrides a Disallow for a more specific path. For example, "Disallow: /private/" blocks everything under /private/, but adding "Allow: /private/public-page" lets bots access that one page. An empty Disallow (Disallow:) means everything is allowed.

Question 4

What does Crawl-delay do in robots.txt?

Accepted Answer

Crawl-delay tells bots to wait a specified number of seconds between requests. For example, "Crawl-delay: 10" asks bots to wait 10 seconds between page fetches. This helps reduce server load. Note: Googlebot ignores Crawl-delay (use Google Search Console instead), but Bingbot and others respect it.

Question 5

Where do I put the robots.txt file?

Accepted Answer

The robots.txt file must be placed at the root of your domain, accessible at https://yourdomain.com/robots.txt. It only applies to that specific domain and protocol — a robots.txt on example.com does not apply to subdomain.example.com. Each subdomain needs its own robots.txt file.

Question 6

Does robots.txt affect SEO?

Accepted Answer

Yes, robots.txt directly impacts SEO. Blocking important pages prevents search engines from crawling and indexing them, making them invisible in search results. However, blocking low-value pages (admin panels, duplicate content, internal search results) can help search engines focus their crawl budget on your important content. Always ensure your key pages are accessible.

Question 7

Can I use robots.txt to remove pages from Google?

Accepted Answer

Not directly. Blocking a page in robots.txt prevents crawling but does not remove it from search results if Google already knows about it. Google may still show the URL (without a snippet) in results. To remove a page, use the "noindex" meta tag or X-Robots-Tag header, then allow crawling so Google can see the noindex directive.

Question 8

What is the Sitemap directive in robots.txt?

Accepted Answer

The Sitemap directive tells search engines where to find your XML sitemap. Add "Sitemap: https://example.com/sitemap.xml" at the end of your robots.txt. You can list multiple sitemaps. This helps search engines discover all your pages efficiently, especially for large sites.

Robots.txt Generator

Quick Presets

User-Agent Rules

Sitemap URL

Generated robots.txt

Common Bot Reference

How to Use

Frequently Asked Questions

Related Tools