Build robots.txt files with allow/disallow rules and sitemap directives. Part of the DevTools Surf developer suite. Browse more tools in the Web / Frontend collection.
Use Cases
Block search engines from indexing staging or admin paths without requiring authentication.
Prevent crawlers from indexing duplicate content (pagination, faceted navigation) that causes SEO dilution.
Specify a Sitemap URL so search engines can discover all indexed pages efficiently.
Allow specific bots (Googlebot) while disallowing others (aggressive scrapers) using agent-specific rules.
Tips
Test your robots.txt against specific crawlers using Google Search Console's robots.txt tester before deploying — syntax errors can accidentally block all crawlers.
Use specific Disallow paths rather than broad wildcards where possible — overly broad rules can block pages you want indexed.
Include the Sitemap URL at the bottom of robots.txt — search engines use it regardless of which crawlers are disallowed.
Fun Facts
The Robots Exclusion Protocol was proposed by Martijn Koster in 1994 via a web mailing list. It was never formally standardized until Google published RFC 9309 in 2022 — 28 years after adoption.
Google, Bing, and Apple's bot all respect robots.txt but malicious scrapers and vulnerability scanners routinely ignore it — robots.txt is a convention, not an access control mechanism.
Search engines discovered robots.txt as a standard by crawling websites and noticing the file at a consistent path. Koster himself called the rapid adoption 'an experiment that somehow worked.'
FAQ
Does robots.txt prevent a page from appearing in search results?
No — blocked pages can still appear in results if other sites link to them. To prevent indexing, use a noindex meta tag or X-Robots-Tag header on the page itself. Robots.txt only prevents crawling, not indexing.
Can I use wildcards in robots.txt?
Yes — * matches any sequence of characters. Disallow: /admin/* blocks all paths starting with /admin/. The $ character matches end-of-URL. These are supported by all major search engine crawlers.