How-To & Life · Guide · Developer Utilities
How to write robots.txt
Create a robots.txt file to control search engine bots without breaking SEO. Explore user-agent rules, sitemap directives, and crawl-delay in a free online guide.
A robots.txt file is a 1994-vintage plain-text protocol that still controls whether Googlebot, Bingbot, and a zoo of other crawlers touch your site. It lives at exactly one path — /robots.txt at your root domain — and a single typo can either leak your staging environment into search results or de-index your entire production site. The syntax looks trivial, but the rules around precedence, wildcards, and the difference between crawling and indexing trip up even experienced SEOs. This guide covers the full directive set (User-agent, Disallow, Allow, Sitemap, Crawl-delay), wildcard and anchor matching, why Noindex was moved out of robots.txt in 2019, and the common patterns that keep staging environments private without breaking production.
Advertisement
What robots.txt actually controls
Robots.txt tells well-behaved crawlers which URLs they may request. It does not enforce anything — malicious scrapers ignore it — and it does not directly stop a page from appearing in search results. A URL blocked in robots.txt can still be indexed if Google discovers it via external links; the search result simply has no snippet because the crawler never read the page body.
If you need a page to stay out of the index, use an HTTP X-Robots-Tag: noindex header or a <meta name="robots" content="noindex">tag — and crucially, leave the URL crawlable so Google can see the directive.
File location and format
The file must be at https://example.com/robots.txt. Subdomains need their own — blog.example.com/robots.txt is separate from example.com/robots.txt. The content type should be text/plain and encoding UTF-8. A BOM at the start is tolerated but avoid it.
Maximum file size honored by Google is 500 KiB. Anything past that is truncated, and the truncation can land mid-rule. Keep real-world files well under 50 KiB.
User-agent: targeting specific crawlers
Every rule block starts with one or more User-agent lines. The value is a substring match against the crawler’s product token, case-insensitive.
User-agent: Googlebot Disallow: /admin/ User-agent: Bingbot Disallow: /admin/ Crawl-delay: 10 User-agent: * Disallow: /private/
The * wildcard matches every crawler that has no more specific block. A crawler picks the single most specific matching group and obeys only that one — rules do not merge across groups. If Googlebot matches both its own block and the * block, it follows only the Googlebot block.
Disallow and Allow
Disallow gives a path prefix the crawler must not request. Allow overrides a broader Disallow for a narrower path. Both are prefix matches starting from the root of the domain.
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
When rules conflict, Google uses the longest-matching path rule. Above, a request to /wp-admin/admin-ajax.php matches Allow (28 characters) more specifically than Disallow (10 characters), so it is allowed.
A bare Disallow: with no path means “nothing is disallowed” — it is the way to open a section back up. Disallow: / blocks the entire site.
Wildcards and end-of-URL anchor
Google, Bing, and most modern crawlers support two pattern characters beyond plain prefix matching:
* matches any sequence of characters. $ anchors the pattern to the end of the URL.
User-agent: * Disallow: /*?sessionid= Disallow: /*.pdf$ Allow: /public/*.pdf$
The first line blocks any URL containing ?sessionid=. The second blocks URLs ending in .pdf — without $, /file.pdf.html would also match. The Allow line then re-opens PDFs under /public/.
Sitemap directive
Unlike the rule directives, Sitemap lines are global — they are not tied to any User-agent group and can appear anywhere in the file. Use absolute URLs.
Sitemap: https://example.com/sitemap.xml Sitemap: https://example.com/sitemap-news.xml
You can list multiple sitemaps or point to a sitemap index. This is the only directive that actively tells crawlers where to find content — the rest only tell them where not to go.
Crawl-delay — limited support
Crawl-delay: N asks the crawler to wait N seconds between requests. Bing and Yandex honor it; Google ignores it entirely (use Search Console crawl rate settings instead). Baidu interprets it differently. In practice, scope it to the bots that respect it.
Noindex is no longer supported here
Google stopped obeying Noindex: directives in robots.txt on September 1, 2019. If you still have Noindex: /thanks lines, move that control to a <meta name="robots" content="noindex"> tag or an X-Robots-Tag header and make sure the URL is crawlable. A page that is both noindex and robots-disallowed is the worst of both worlds: Google cannot see the noindex tag, so it may leave the URL in the index as a snippetless link.
Patterns for staging and preview
For staging subdomains (staging.example.com), serve:
User-agent: * Disallow: /
Combine with HTTP Basic Auth or IP allowlisting — robots.txt alone is a courtesy, not a gate. If a competitor or scraper finds the staging URL, they will ignore your robots.txt and crawl it anyway.
For a production site with a private admin area and a search results page you do not want indexed:
User-agent: * Disallow: /admin/ Disallow: /cart Disallow: /search Disallow: /*?utm_ Allow: / Sitemap: https://example.com/sitemap.xml
The /*?utm_ line blocks the duplicate-content URLs that UTM-tagged inbound links create.
Testing your file
Google Search Console has a robots.txt report under Settings that shows the last-fetched copy, any parse errors, and the timestamp. For live testing against specific URLs, use the URL Inspection tool — it reports whether a given URL is blocked and which rule blocked it.
Before deploying, run the full file through a syntax checker that understands Google’s precedence rules. A misplaced Allow/Disallow can look harmless but silently block a whole section.
Common mistakes
Blocking CSS and JS. Google needs to render pages to evaluate them. Disallowing /wp-content/ or /static/ can hide the styles and scripts Googlebot uses for layout, which hurts rankings. Leave asset directories crawlable.
Using robots.txt to hide sensitive URLs. The file is public. Listing /admin-secret-backup/ in a Disallow line is like putting a giant arrow on it. Use auth, not robots.txt, for security.
Expecting Disallow to remove pages from the index.Disallow stops crawling, not indexing. Already-indexed URLs can stay in search results for months as snippetless links. To remove, use noindex (and keep crawlable) until Google processes it, then block.
Case-sensitive path mismatch. Paths are case-sensitive. Disallow: /Admin/ does not block /admin/. Match your actual URL casing.
Forgetting subdomain scope. Uploading robots.txt to the root does nothing for cdn.example.com. Each subdomain that serves HTTP needs its own file.
Trailing-slash surprises. Disallow: /foo blocks /foo, /foo/, /foobar, and /foo.html. If you meant only the folder, write Disallow: /foo/.
Run the numbers
Draft and validate a production-ready file with the robots.txt generator. Pair with the sitemap URL generator so the Sitemap: line you add actually points at a well-formed file, and the URL parser to verify the exact path shape your rules will match against.
Use these while you read
Tools that pair with this guide
- Robots.txt GeneratorGenerate a valid robots.txt for your site with disallow rules, allow overrides, and a sitemap link. Free, instant, and no sign-up required in your browser.Developer Utilities
- Sitemap URL GeneratorGenerate a valid sitemap.xml with changefreq, priority, and lastmod from a URL list instantly online. Get it ready for Google Search Console free, no sign-up.Developer Utilities
- Schema Markup GeneratorGenerate valid JSON-LD schema markup for Article, Product, FAQ, and more instantly online. Copy structured data into your head section free, no sign-up.Developer Utilities
- JSON FormatterPaste JSON to beautify, validate, and minify with clear error messages, all in your browser without sign-up—free instant tool for developers.Developer Utilities
Advertisement
Continue reading
- How-To & LifeHow to Get Started with GitHub and CopilotBuild your first workflow in a week with free setup steps. Get started instantly with our online guide for GitHub and Copilot, no download required.
- How-To & LifeHow to Choose No-Code ToolsSelect the right no-code platform by comparing Webflow, Bubble, Softr, and more. Free, instant guide covers use cases, lock-in risks, and pricing traps.
- How-To & LifeHow to Start with VR PeripheralsFree starter guide to find the right VR headset (Quest 3, Index, PSVR2), pick accessories that matter, and plan play‑area space. Instant access, no sign‑up needed.
- How-To & LifeCybersecurity Guide for Remote WorkersFree remote-worker security guide. Check passwords, MFA, VPN timing, disk encryption, phishing risks, and what your employer can actually see.
- How-To & LifeHow to Repair or Refurbish TechApply the 50/75% rule, age heuristics, and DIY vs pro tips for phones, laptops, and consoles free online. Estimate repair costs instantly in your browser with no signup.
- How-To & LifeHow to Check Color ContrastAudit colors against 4.5:1 AA and 7:1 AAA thresholds, including large text and dark mode. Test contrast instantly online with this free, no-sign-up tool.